Class: BufferedTokenizer
Relationships & Source Files | |
Inherits: | Object |
Defined in: | lib/em/buftok.rb |
Overview
BufferedTokenizer
takes a delimiter upon instantiation, or acts line-based
by default. It allows input to be spoon-fed from some outside source which
receives arbitrary length datagrams which may-or-may-not contain the token
by which entities are delimited. In this respect it's ideally paired with
something like ::EventMachine
(http://rubyeventmachine.com/).
Class Method Summary
-
.new(delimiter = $/) ⇒ BufferedTokenizer
constructor
New BufferedTokenizers will operate on lines delimited by a delimiter, which is by default the global input delimiter $/ ("\n").
Instance Method Summary
-
#extract(data)
Extract takes an arbitrary string of input data and returns an array of tokenized entities, provided there were any available to extract.
-
#flush
Flush the contents of the input buffer, i.e.
Constructor Details
.new(delimiter = $/) ⇒ BufferedTokenizer
New BufferedTokenizers will operate on lines delimited by a delimiter, which is by default the global input delimiter $/ ("\n").
The input buffer is stored as an array. This is by far the most efficient approach given language constraints (in C a linked list would be a more appropriate data structure). Segments of input data are stored in a list which is only joined when a token is reached, substantially reducing the number of objects required for the operation.
# File 'lib/em/buftok.rb', line 15
def initialize(delimiter = $/) @delimiter = delimiter @input = [] @tail = '' @trim = @delimiter.length - 1 end
Instance Method Details
#extract(data)
Extract takes an arbitrary string of input data and returns an array of tokenized entities, provided there were any available to extract. This makes for easy processing of datagrams using a pattern like:
tokenizer.extract(data).map { |entity| Decode(entity) }.each do ...
Using -1 makes split to return "" if the token is at the end of the string, meaning the last element is the start of the next chunk.
# File 'lib/em/buftok.rb', line 30
def extract(data) if @trim > 0 tail_end = @tail.slice!(-@trim, @trim) # returns nil if string is too short data = tail_end + data if tail_end end @input << @tail entities = data.split(@delimiter, -1) @tail = entities.shift unless entities.empty? @input << @tail entities.unshift @input.join @input.clear @tail = entities.pop end entities end
#flush
Flush the contents of the input buffer, i.e. return the input buffer even though a token has not yet been encountered
# File 'lib/em/buftok.rb', line 52
def flush @input << @tail buffer = @input.join @input.clear @tail = "" # @tail.clear is slightly faster, but not supported on 1.8.7 buffer end