Class: Nokogiri::XML::SAX::Document
| Relationships & Source Files | |
| Extension / Inclusion / Inheritance Descendants | |
|
Subclasses:
|
|
| Inherits: | Object |
| Defined in: | lib/nokogiri/xml/sax/document.rb |
Overview
The Document class is used for registering types of events you are interested in
handling. All of the methods on this class are available as possible events while parsing an
\XML document. To register for any particular event, subclass this class and implement the
methods you are interested in knowing about.
To only be notified about start and end element events, write a class like this:
class MyHandler < Nokogiri::XML::SAX::Document
def start_element name, attrs = []
puts "#{name} started!"
end
def end_element name
puts "#{name} ended"
end
end
You can use this event handler for any SAX-style parser included with ::Nokogiri.
See also:
Entity Handling
⚠ Entity handling is complicated in a ::Nokogiri::XML::SAX parser! Please read this section carefully if
you're not getting the behavior you expect.
Entities will be reported to the user via callbacks to #characters, to #reference, or
possibly to both. The behavior is determined by a combination of entity type and the value
of ParserContext#replace_entities. (Recall that the default value of
ParserContext#replace_entities is false.)
⚠ It is UNSAFE to set ParserContext#replace_entities to true when parsing untrusted
documents.
💡 For more information on entity types, see Wikipedia's page on DTDs.
| Entity type | #characters | #reference |
|---|---|---|
| Char ref (e.g., ) | always | never |
| Predefined (e.g., &) | always | never |
| Undeclared † | never | #replace_entities == false |
| Internal | always | #replace_entities == false |
| External † | #replace_entities == true | #replace_entities == false |
† In the case where the replacement text for the entity is unknown (e.g., an undeclared entity
or an external entity that could not be resolved because of network issues), then the
replacement text will not be reported. If ParserContext#replace_entities is true, this
means the #characters callback will not be invoked. If ParserContext#replace_entities is
false, then the #reference callback will be invoked, but with nil for the content
argument.
Instance Method Summary
-
#cdata_block(string)
Called when cdata blocks are found [Parameters] -
stringcontains the cdata content. -
#characters(string)
Called when character data is parsed, and for parsed entities when ParserContext#replace_entities is
true. -
#comment(string)
Called when comments are encountered [Parameters] -
stringcontains the comment data. -
#end_document
Called when document ends parsing.
-
#end_element(name)
Called at the end of an element.
-
#end_element_namespace(name, prefix = nil, uri = nil)
Called at the end of an element.
-
#error(string)
Called on document errors [Parameters] -
stringcontains the error. -
#processing_instruction(name, content)
Called when processing instructions are found [Parameters] -
nameis the target of the instruction -contentis the value of the instruction. -
#reference(name, content)
Called when a parsed entity is referenced and not replaced.
-
#start_document
Called when document starts parsing.
-
#start_element(name, attrs = [])
Called at the beginning of an element.
-
#start_element_namespace(name, attrs = [], prefix = nil, uri = nil, ns = [])
Called at the beginning of an element.
-
#warning(string)
Called on document warnings [Parameters] -
stringcontains the warning. -
#xmldecl(version, encoding, standalone)
Called when an XML declaration is parsed.
Instance Method Details
#cdata_block(string)
Called when cdata blocks are found
- Parameters
-
stringcontains the cdata content
# File 'lib/nokogiri/xml/sax/document.rb', line 245
def cdata_block(string) end
#characters(string)
Called when character data is parsed, and for parsed entities when ParserContext#replace_entities is true.
- Parameters
-
stringcontains the character data or entity replacement text
⚠ Please see Document@Entity+Handling for important information about how entities are handled.
⚠ This method might be called multiple times for a contiguous string of characters.
# File 'lib/nokogiri/xml/sax/document.rb', line 201
def characters(string) end
#comment(string)
Called when comments are encountered
- Parameters
-
stringcontains the comment data
# File 'lib/nokogiri/xml/sax/document.rb', line 224
def comment(string) end
#end_document
Called when document ends parsing.
# File 'lib/nokogiri/xml/sax/document.rb', line 83
def end_document end
#end_element(name)
Called at the end of an element.
- Parameters
-
name(String) the name of the element being closed
# File 'lib/nokogiri/xml/sax/document.rb', line 122
def end_element(name) end
#end_element_namespace(name, prefix = nil, uri = nil)
Called at the end of an element.
- Parameters
-
name(String) is the name of the element -
prefix(String, nil) is the namespace prefix for the element -
uri(String, nil) is the associated URI for the element’s namespace
# File 'lib/nokogiri/xml/sax/document.rb', line 185
def end_element_namespace(name, prefix = nil, uri = nil) # Deal with SAX v1 interface end_element([prefix, name].compact.join(":")) end
#error(string)
Called on document errors
- Parameters
-
stringcontains the error
# File 'lib/nokogiri/xml/sax/document.rb', line 238
def error(string) end
#processing_instruction(name, content)
Called when processing instructions are found
- Parameters
-
nameis the target of the instruction -
contentis the value of the instruction
# File 'lib/nokogiri/xml/sax/document.rb', line 253
def processing_instruction(name, content) end
#reference(name, content)
Called when a parsed entity is referenced and not replaced.
- Parameters
-
name(String) is the name of the entity -
content(String, nil) is the replacement text for the entity, if known
⚠ Please see Document@Entity+Handling for important information about how entities are handled.
⚠ An internal entity may result in a call to both #characters and #reference.
Since v1.17.0
# File 'lib/nokogiri/xml/sax/document.rb', line 217
def reference(name, content) end
#start_document
Called when document starts parsing.
# File 'lib/nokogiri/xml/sax/document.rb', line 78
def start_document end
#start_element(name, attrs = [])
Called at the beginning of an element.
- Parameters
-
name(String) the name of the element -
attrs(Array<Array<String>>) an assoc list of namespace declarations and attributes, e.g.:[ ["xmlns:foo", "http://sample.net"], ["size", "large"] ]
💡If you’re dealing with ::Nokogiri::XML and need to handle namespaces, use the #start_element_namespace method instead.
Note that the element namespace and any attribute namespaces are not provided, and so any namespaced elements or attributes will be returned as strings including the prefix:
parser.parse(<<~XML)
<root xmlns:foo='http://foo.example.com/' xmlns='http://example.com/'>
<foo:bar foo:quux="xxx">hello world</foo:bar>
</root>
XML
assert_pattern do
parser.document.start_elements => [
["root", [["xmlns:foo", "http://foo.example.com/"], ["xmlns", "http://example.com/"]]],
["foo:bar", [["foo:quux", "xxx"]]],
]
end
# File 'lib/nokogiri/xml/sax/document.rb', line 113
def start_element(name, attrs = []) end
#start_element_namespace(name, attrs = [], prefix = nil, uri = nil, ns = [])
Called at the beginning of an element.
- Parameters
-
name(String) is the name of the element -
attrs(Array<Attribute>) is an array of structs with the following properties:-
localname(String) the local name of the attribute -
value(String) the value of the attribute -
prefix(String, nil) the namespace prefix of the attribute -
uri(String, nil) the namespace URI of the attribute
-
-
prefix(String, nil) is the namespace prefix for the element -
uri(String, nil) is the associated URI for the element’s namespace -
ns(Array<Array<String, String>>) is an assoc list of namespace declarations on the element
💡If you’re dealing with ::Nokogiri::HTML or don’t care about namespaces, try #start_element instead.
- Example
-
it “start_elements_namespace is called with namespaced attributes” do
parser.parse(<<~XML) <root xmlns:foo='http://foo.example.com/'> <foo:a foo:bar='hello' /> </root> XML assert_pattern do parser.document.start_elements_namespace => [ [ "root", [], nil, nil, [["foo", "http://foo.example.com/"]], # namespace declarations ], [ "a", [Nokogiri::XML::SAX::Parser::Attribute(localname: "bar", prefix: "foo", uri: "http://foo.example.com/", value: "hello")], # prefixed attribute "foo", "http://foo.example.com/", # prefix and uri for the "a" element [], ] ] endend
# File 'lib/nokogiri/xml/sax/document.rb', line 166
def start_element_namespace(name, attrs = [], prefix = nil, uri = nil, ns = []) # rubocop:disable Metrics/ParameterLists # Deal with SAX v1 interface name = [prefix, name].compact.join(":") attributes = ns.map do |ns_prefix, ns_uri| [["xmlns", ns_prefix].compact.join(":"), ns_uri] end + attrs.map do |attr| [[attr.prefix, attr.localname].compact.join(":"), attr.value] end start_element(name, attributes) end
#warning(string)
Called on document warnings
- Parameters
-
stringcontains the warning
# File 'lib/nokogiri/xml/sax/document.rb', line 231
def warning(string) end
#xmldecl(version, encoding, standalone)
Called when an XML declaration is parsed.
- Parameters
-
version(String) the version attribute -
encoding(String, nil) the encoding of the document if present, elsenil -
standalone(“yes”, “no”, nil) the standalone attribute if present, elsenil
# File 'lib/nokogiri/xml/sax/document.rb', line 73
def xmldecl(version, encoding, standalone) end