123456789_123456789_123456789_123456789_123456789_

Class: BufferedTokenizer

Relationships & Source Files
Inherits: Object
Defined in: lib/em/buftok.rb

Overview

BufferedTokenizer takes a delimiter upon instantiation, or acts line-based by default. It allows input to be spoon-fed from some outside source which receives arbitrary length datagrams which may-or-may-not contain the token by which entities are delimited. In this respect it's ideally paired with something like ::EventMachine (http://rubyeventmachine.com/).

Class Method Summary

Instance Method Summary

  • #extract(data)

    Extract takes an arbitrary string of input data and returns an array of tokenized entities, provided there were any available to extract.

  • #flush

    Flush the contents of the input buffer, i.e.

Constructor Details

.new(delimiter = $/) ⇒ BufferedTokenizer

New BufferedTokenizers will operate on lines delimited by a delimiter, which is by default the global input delimiter $/ ("\n").

The input buffer is stored as an array. This is by far the most efficient approach given language constraints (in C a linked list would be a more appropriate data structure). Segments of input data are stored in a list which is only joined when a token is reached, substantially reducing the number of objects required for the operation.

[ GitHub ]

  
# File 'lib/em/buftok.rb', line 15

def initialize(delimiter = $/)
  @delimiter = delimiter
  @input = []
  @tail = ''
  @trim = @delimiter.length - 1
end

Instance Method Details

#extract(data)

Extract takes an arbitrary string of input data and returns an array of tokenized entities, provided there were any available to extract. This makes for easy processing of datagrams using a pattern like:

tokenizer.extract(data).map { |entity| Decode(entity) }.each do ...

Using -1 makes split to return "" if the token is at the end of the string, meaning the last element is the start of the next chunk.

[ GitHub ]

  
# File 'lib/em/buftok.rb', line 30

def extract(data)
  if @trim > 0
    tail_end = @tail.slice!(-@trim, @trim) # returns nil if string is too short
    data = tail_end + data if tail_end
  end

  @input << @tail
  entities = data.split(@delimiter, -1)
  @tail = entities.shift

  unless entities.empty?
    @input << @tail
    entities.unshift @input.join
    @input.clear
    @tail = entities.pop
  end

  entities
end

#flush

Flush the contents of the input buffer, i.e. return the input buffer even though a token has not yet been encountered

[ GitHub ]

  
# File 'lib/em/buftok.rb', line 52

def flush
  @input << @tail
  buffer = @input.join
  @input.clear
  @tail = "" # @tail.clear is slightly faster, but not supported on 1.8.7
  buffer
end