Class: Nokogiri::XML::ParseOptions
Relationships & Source Files | |
Inherits: | Object |
Defined in: | lib/nokogiri/xml/parse_options.rb |
Overview
Options that control the parsing behavior for Document
, DocumentFragment
, ::Nokogiri::HTML4::Document
, ::Nokogiri::HTML4::DocumentFragment
, ::Nokogiri::XSLT::Stylesheet
, and Schema
.
These options directly expose libxml2’s parse options, which are all boolean in the sense that an option is “on” or “off”.
💡 Note that ::Nokogiri::HTML5
parsing has a separate, orthogonal set of options due to the nature of the ::Nokogiri::HTML5
specification. See ::Nokogiri::HTML5
.
⚠ Not all parse options are supported on JRuby. ::Nokogiri
will attempt to invoke the equivalent behavior in Xerces/NekoHTML on JRuby when it’s possible.
Setting and unsetting parse options
You can build your own combinations of parse options by using any of the following methods:
- ParseOptions method chaining
-
Every option has an equivalent method in lowercase. You can chain these methods together to set various combinations.
# Set the HUGE & PEDANTIC options po = Nokogiri::XML::ParseOptions.new.huge.pedantic doc = Nokogiri::XML::Document.parse(xml, nil, nil, po)
Every option has an equivalent
no{option}
method in lowercase. You can call these methods on an instance of ParseOptions to unset the option.# Set the HUGE & PEDANTIC options po = Nokogiri::XML::ParseOptions.new.huge.pedantic # later we want to modify the options po.nohuge # Unset the HUGE option po.nopedantic # Unset the PEDANTIC option
💡 Note that some options begin with “no” leading to the logical but perhaps unintuitive double negative:
po.nocdata # Set the NOCDATA parse option po.nonocdata # Unset the NOCDATA parse option
💡 Note that negation is not available for STRICT, which is itself a negation of all other features.
- Using Ruby Blocks
-
Most parsing methods will accept a block for configuration of parse options, and we recommend chaining the setter methods:
doc = Nokogiri::XML::Document.parse(xml) { |config| config.huge.pedantic }
- ParseOptions constants
-
You can also use the constants declared under Nokogiri::XML::ParseOptions to set various combinations. They are bits in a bitmask, and so can be combined with bitwise operators:
po = Nokogiri::XML::ParseOptions.new(Nokogiri::XML::ParseOptions::HUGE | Nokogiri::XML::ParseOptions::PEDANTIC) doc = Nokogiri::XML::Document.parse(xml, nil, nil, po)
Constant Summary
-
BIG_LINES =
Support line numbers up to
long int
(default is ashort int
). On by default for forDocument
,DocumentFragment
,::Nokogiri::HTML4::Document
,::Nokogiri::HTML4::DocumentFragment
,::Nokogiri::XSLT::Stylesheet
, andSchema
.1 << 22
-
COMPACT =
Compact small text nodes. Off by default.
⚠ No modification of the DOM tree is allowed after parsing. libxml2 may crash if you try to modify the tree.
1 << 16
-
DEFAULT_HTML =
The options mask used by default used for parsing
::Nokogiri::HTML4::Document
and::Nokogiri::HTML4::DocumentFragment
RECOVER | NOERROR | NOWARNING | NONET | BIG_LINES
-
DEFAULT_SCHEMA =
The options mask used by default used for parsing
Schema
NONET | BIG_LINES
-
DEFAULT_XML =
The options mask used by default for parsing
Document
andDocumentFragment
RECOVER | NONET | BIG_LINES
-
DEFAULT_XSLT =
The options mask used by default used for parsing
::Nokogiri::XSLT::Stylesheet
RECOVER | NONET | NOENT | DTDLOAD | DTDATTR | NOCDATA | BIG_LINES
-
DTDATTR =
Default DTD attributes. On by default for
::Nokogiri::XSLT::Stylesheet
.1 << 3
-
DTDLOAD =
Load external subsets. On by default for
::Nokogiri::XSLT::Stylesheet
.⚠ It is UNSAFE to set this option when parsing untrusted documents.
1 << 2
-
DTDVALID =
Validate with the
DTD
. Off by default.1 << 4
-
HUGE =
Relax any hardcoded limit from the parser. Off by default.
⚠ It is UNSAFE to set this option when parsing untrusted documents.
1 << 19
-
NOBASEFIX =
Do not fixup XInclude xml:base uris. Off by default
1 << 18
-
NOBLANKS =
Remove blank nodes. Off by default.
1 << 8
-
NOCDATA =
Merge CDATA as text nodes. On by default for
::Nokogiri::XSLT::Stylesheet
.1 << 14
-
NODICT =
Do not reuse the context dictionary. Off by default.
1 << 12
-
NOENT =
Substitute entities. Off by default.
⚠ This option enables entity substitution, contrary to what the name implies.
⚠ It is UNSAFE to set this option when parsing untrusted documents.
1 << 1
-
NOERROR =
Suppress error reports. On by default for
::Nokogiri::HTML4::Document
and::Nokogiri::HTML4::DocumentFragment
1 << 5
-
NONET =
Forbid network access. On by default for
Document
,DocumentFragment
,::Nokogiri::HTML4::Document
,::Nokogiri::HTML4::DocumentFragment
,::Nokogiri::XSLT::Stylesheet
, andSchema
.⚠ It is UNSAFE to unset this option when parsing untrusted documents.
1 << 11
-
NOWARNING =
Suppress warning reports. On by default for
::Nokogiri::HTML4::Document
and::Nokogiri::HTML4::DocumentFragment
1 << 6
-
NOXINCNODE =
Do not generate XInclude START/END nodes. Off by default.
1 << 15
-
NSCLEAN =
Remove redundant namespaces declarations. Off by default.
1 << 13
-
OLD10 =
Parse using XML-1.0 before update 5. Off by default
1 << 17
-
PEDANTIC =
Enable pedantic error reporting. Off by default.
1 << 7
-
RECOVER =
Recover from errors. On by default for
Document
,DocumentFragment
,::Nokogiri::HTML4::Document
,::Nokogiri::HTML4::DocumentFragment
,::Nokogiri::XSLT::Stylesheet
, andSchema
.1 << 0
-
SAX1 =
Use the
SAX1
interface internally. Off by default.1 << 9
-
STRICT =
Strict parsing
0
-
XINCLUDE =
Implement XInclude substitution. Off by default.
1 << 10
Class Method Summary
- .new(options = STRICT) ⇒ ParseOptions constructor
Instance Attribute Summary
- #options (also: #to_i) rw
- #strict readonly
- #strict? ⇒ Boolean readonly
-
#to_i
readonly
Alias for #options.
Instance Method Summary
Constructor Details
.new(options = STRICT) ⇒ ParseOptions
# File 'lib/nokogiri/xml/parse_options.rb', line 165
def initialize( = STRICT) @options = end
Instance Attribute Details
#options (rw) Also known as: #to_i
[ GitHub ]# File 'lib/nokogiri/xml/parse_options.rb', line 163
attr_accessor :
#strict (readonly)
[ GitHub ]# File 'lib/nokogiri/xml/parse_options.rb', line 189
def strict @options &= ~RECOVER self end
#strict? ⇒ Boolean
(readonly)
[ GitHub ]
#to_i (readonly)
Alias for #options.
# File 'lib/nokogiri/xml/parse_options.rb', line 202
alias_method :to_i, :
Instance Method Details
#==(other)
[ GitHub ]#inspect
[ GitHub ]# File 'lib/nokogiri/xml/parse_options.rb', line 204
def inspect = [] self.class.constants.each do |k| << k.downcase if send(:"#{k.downcase}?") end super.sub(/>$/, " " + .join(", ") + ">") end