123456789_123456789_123456789_123456789_123456789_

Class: Nokogiri::XML::SAX::Parser

Relationships & Source Files
Namespace Children
Classes:
Extension / Inclusion / Inheritance Descendants
Subclasses:
Inherits: Object
Defined in: lib/nokogiri/xml/sax/parser.rb,
ext/nokogiri/xml_sax_parser.c

Overview

This parser is a ::Nokogiri::XML::SAX style parser that reads it’s input as it deems necessary. The parser takes a Document, an optional encoding, then given an ::Nokogiri::XML input, sends messages to the Document.

Here is an example of using this parser:

# Create a subclass of Nokogiri::XML::SAX::Document and implement
# the events we care about:
class MyDoc < Nokogiri::XML::SAX::Document
  def start_element name, attrs = []
    puts "starting: #{name}"
  end

  def end_element name
    puts "ending: #{name}"
  end
end

# Create our parser
parser = Nokogiri::XML::SAX::Parser.new(MyDoc.new)

# Send some XML to the parser
parser.parse(File.open(ARGV[0]))

For more information about ::Nokogiri::XML::SAX parsers, see ::Nokogiri::XML::SAX. Also see Document for the available events.

Constant Summary

  • ENCODINGS =

    Encodinds this parser supports

    # File 'lib/nokogiri/xml/sax/parser.rb', line 39
    {
      "NONE" => 0, # No char encoding detected
      "UTF-8" => 1, # UTF-8
      "UTF16LE" => 2, # UTF-16 little endian
      "UTF16BE" => 3, # UTF-16 big endian
      "UCS4LE" => 4, # UCS-4 little endian
      "UCS4BE" => 5, # UCS-4 big endian
      "EBCDIC" => 6, # EBCDIC uh!
      "UCS4-2143" => 7, # UCS-4 unusual ordering
      "UCS4-3412" => 8, # UCS-4 unusual ordering
      "UCS2" => 9, # UCS-2
      "ISO-8859-1" => 10, # ISO-8859-1 ISO Latin 1
      "ISO-8859-2" => 11, # ISO-8859-2 ISO Latin 2
      "ISO-8859-3" => 12, # ISO-8859-3
      "ISO-8859-4" => 13, # ISO-8859-4
      "ISO-8859-5" => 14, # ISO-8859-5
      "ISO-8859-6" => 15, # ISO-8859-6
      "ISO-8859-7" => 16, # ISO-8859-7
      "ISO-8859-8" => 17, # ISO-8859-8
      "ISO-8859-9" => 18, # ISO-8859-9
      "ISO-2022-JP" => 19, # ISO-2022-JP
      "SHIFT-JIS" => 20, # Shift_JIS
      "EUC-JP" => 21, # EUC-JP
      "ASCII" => 22, # pure ASCII
    }

Class Method Summary

Instance Attribute Summary

Instance Method Summary

Constructor Details

.new(doc = Nokogiri::XML::SAX::Document.new, encoding = "UTF-8") ⇒ Parser

Create a new Parser with doc and #encoding

[ GitHub ]

  
# File 'lib/nokogiri/xml/sax/parser.rb', line 72

def initialize(doc = Nokogiri::XML::SAX::Document.new, encoding = "UTF-8")
  @encoding = check_encoding(encoding)
  @document = doc
  @warned   = false
end

Instance Attribute Details

#document (rw)

The Document where events will be sent.

[ GitHub ]

  
# File 'lib/nokogiri/xml/sax/parser.rb', line 66

attr_accessor :document

#encoding (rw)

The encoding beings used for this document.

[ GitHub ]

  
# File 'lib/nokogiri/xml/sax/parser.rb', line 69

attr_accessor :encoding

Instance Method Details

#check_encoding(encoding) (private)

[ GitHub ]

  
# File 'lib/nokogiri/xml/sax/parser.rb', line 117

def check_encoding(encoding)
  encoding.upcase.tap do |enc|
    raise ArgumentError, "'#{enc}' is not a valid encoding" unless ENCODINGS[enc]
  end
end

#parse(thing, &block)

Parse given thing which may be a string containing xml, or an IO object.

[ GitHub ]

  
# File 'lib/nokogiri/xml/sax/parser.rb', line 81

def parse(thing, &block)
  if thing.respond_to?(:read) && thing.respond_to?(:close)
    parse_io(thing, &block)
  else
    parse_memory(thing, &block)
  end
end

#parse_file(filename) {|ctx| ... }

Parse a file with filename

Yields:

  • (ctx)

Raises:

  • (ArgumentError)
[ GitHub ]

  
# File 'lib/nokogiri/xml/sax/parser.rb', line 99

def parse_file(filename)
  raise ArgumentError unless filename
  raise Errno::ENOENT unless File.exist?(filename)
  raise Errno::EISDIR if File.directory?(filename)

  ctx = ParserContext.file(filename)
  yield ctx if block_given?
  ctx.parse_with(self)
end

#parse_io(io, encoding = @encoding) {|ctx| ... }

Parse given io

Yields:

  • (ctx)
[ GitHub ]

  
# File 'lib/nokogiri/xml/sax/parser.rb', line 91

def parse_io(io, encoding = @encoding)
  ctx = ParserContext.io(io, ENCODINGS[check_encoding(encoding)])
  yield ctx if block_given?
  ctx.parse_with(self)
end

#parse_memory(data) {|ctx| ... }

Yields:

  • (ctx)
[ GitHub ]

  
# File 'lib/nokogiri/xml/sax/parser.rb', line 109

def parse_memory(data)
  ctx = ParserContext.memory(data)
  yield ctx if block_given?
  ctx.parse_with(self)
end