Class: Nokogiri::XML::ParseOptions
| Relationships & Source Files | |
| Inherits: | Object |
| Defined in: | lib/nokogiri/xml/parse_options.rb |
Overview
\Class to contain options for parsing \XML or \HTML4 (but not \HTML5).
💡 Note that \HTML5 parsing has a separate, orthogonal set of options due to the API of the
\HTML5 library used. See ::Nokogiri::HTML5.
About the Examples
Examples on this page assume that the following code has been executed:
require 'nokogiri' # Make Nokogiri available.
include Nokogiri # Allow omitting leading 'Nokogiri::'.
xml_s = "<root />\n" # String containing XML.
{File.write}('t.xml', xml_s) # File containing XML.
html_s = "<html />\n" # String containing HTML.
{File.write}('t.html', html_s) # File containing HTML.
Examples executed via IRB (interactive Ruby) display \ParseOptions instances
using method #inspect.
Parsing Methods
Each of the parsing methods performs parsing for an \XML or \HTML4 source:
Each requires a leading argument that specifies the source of the text to be parsed; except as noted, the argument's value may be either:
- A string.
- An open IO stream (must respond to methods
readandclose).
Examples:
XML::parse(xml_s) HTML4.parse(html_s) XML::parse(File.open('t.xml')) HTML4.parse(File.open('t.html'))Each accepts a trailing optional argument #options (or keyword argument #options) that specifies parsing options; the argument's value may be either:
- An integer: see Bitmap Constants.
- An instance of \ParseOptions: see ParseOptions.new.
Examples:
XML::parse(xml_s, options: XML::ParseOptions::STRICT) HTML4::parse(html_s, options: XML::ParseOptions::BIG_LINES) XML::parse(xml_s, options: XML::ParseOptions.new.strict) HTML4::parse(html_s, options: XML::ParseOptions.new.big_lines)Each (except as noted) accepts a block that allows parsing options to be specified; see Options-Setting Blocks.
Certain other parsing methods use different options; see \HTML5.
⚠ Not all parse options are supported on JRuby. \Nokogiri attempts to invoke the equivalent behavior in Xerces/NekoHTML on JRuby when it's possible.
Bitmap Constants
Each of the parsing methods discussed here accept an integer argument #options that specifies parsing options.
That integer value may be constructed using the bitmap constants defined in \ParseOptions.
Except for STRICT (see note below), each of the bitmap constants has a non-zero value that represents a bit in an integer value; to illustrate, here are a few of the constants, displayed in binary format (base 2):
{ParseOptions::RECOVER}.to_s(2) # => "1"
{ParseOptions::NOENT}.to_s(2) # => "10"
{ParseOptions::DTDLOAD}.to_s(2) # => "100"
{ParseOptions::DTDATTR}.to_s(2) # => "1000"
{ParseOptions::DTDVALID}.to_s(2) # => "10000"
Any of these constants may be used alone to specify a single option:
{ParseOptions.new}(ParseOptions::DTDLOAD)
### => #<Nokogiri::XML::ParseOptions: ... strict, dtdload>
{ParseOptions.new}(ParseOptions::DTDATTR)
### => #<Nokogiri::XML::ParseOptions: ... strict, dtdattr>
Multiple constants may be ORed together to specify multiple options:
= {ParseOptions::BIG_LINES} | {ParseOptions::COMPACT} | {ParseOptions::NOCDATA}
{ParseOptions.new}()
### => #<Nokogiri::XML::ParseOptions: ... strict, nocdata, compact, big_lines>
Note: The value of constant STRICT is zero; it may be used alone to turn all options off:
{XML.parse}('<root />') {|| puts .inspect }
#<Nokogiri::XML::ParseOptions: recover, nonet, big_lines, default_schema, default_xml>
{XML.parse}('<root />', nil, nil, {ParseOptions::STRICT}) {|| puts .inspect }
#<Nokogiri::XML::ParseOptions: strict>
The single-option bitmask constants are: BIG_LINES, COMPACT, DTDATTR, DTDLOAD, DTDVALID, HUGE, NOBASEFIX, NOBLANKS, NOCDATA, NOENT, NOERROR, NONET, NOWARNING, NOXINCNODE, NSCLEAN, OLD10, PEDANTIC, RECOVER, STRICT, XINCLUDE.
There are also several "shorthand" constants that can set multiple options: DEFAULT_HTML, DEFAULT_SCHEMA, DEFAULT_XML, DEFAULT_XSLT.
Examples:
{ParseOptions.new}(ParseOptions::DEFAULT_HTML)
### => #<Nokogiri::XML::ParseOptions: ... recover, nowarning, nonet, big_lines, default_schema, noerror, default_html, default_xml>
{ParseOptions.new}(ParseOptions::DEFAULT_SCHEMA)
### => #<Nokogiri::XML::ParseOptions: ... strict, nonet, big_lines, default_schema>
{ParseOptions.new}(ParseOptions::DEFAULT_XML)
### => #<Nokogiri::XML::ParseOptions: ... recover, nonet, big_lines, default_schema, default_xml>
{ParseOptions.new}(ParseOptions::DEFAULT_XSLT)
### => #<Nokogiri::XML::ParseOptions: ... recover, noent, dtdload, dtdattr, nonet, nocdata, big_lines, default_xslt, default_schema, default_xml> #
\Nokogiri itself uses these shorthand constants for its parsing, and they are generally most suitable for \Nokogiri users' code.
Options-Setting Blocks
Many of the parsing methods discussed here accept an options-setting block.
The block is called with a new instance of \ParseOptions created with the defaults for the specific method:
{XML.parse}(xml_s) {|| puts .inspect }
#<Nokogiri::XML::ParseOptions: @options=4196353 recover, nonet, big_lines, default_xml, default_schema>
{HTML4.parse}(html_s) {|| puts .inspect }
#<Nokogiri::XML::ParseOptions: @options=4196449 recover, nowarning, nonet, big_lines, default_html, default_xml, noerror, default_schema>
When the block returns, the parsing is performed using those #options.
The block may modify those options, which affects parsing:
bad_xml = '<root>' # End tag missing.
{XML.parse}(bad_xml) # No error because option RECOVER is on.
{XML.parse}(bad_xml) {|| .strict } # Raises SyntaxError because option STRICT is on.
Convenience Methods
A \ParseOptions object has three sets of convenience methods, each based on the name of one of the constants:
Setters: each is the downcase of an option name, and turns on an option:
= ParseOptions.new # => #<Nokogiri::XML::ParseOptions: ... strict> .big_lines # => #<Nokogiri::XML::ParseOptions: ... strict, big_lines> .compact # => #<Nokogiri::XML::ParseOptions: ... strict, compact, big_lines>Unsetters: each begins with
no, and turns off an option.Note that there is no unsetter
nostrict, but the setterrecoverserves the same purpose:.nobig_lines # => #<Nokogiri::XML::ParseOptions: ... strict, compact> .nocompact # => #<Nokogiri::XML::ParseOptions: ... strict> .recover # Functionally equivalent to nostrict. # => #<Nokogiri::XML::ParseOptions: ... recover> .noent # Set NOENT. # => #<Nokogiri::XML::ParseOptions: ... recover, noent> .nonoent # Unset NOENT. # => #<Nokogiri::XML::ParseOptions: ... recover>💡 Note that some options begin with
no, leading to the logical but perhaps unintuitive double negative:po.nocdata # Set the NOCDATA parse option po.nonocdata # Unset the NOCDATA parse optionQueries: each ends with
?, and returns whether an option is on or off:.recover? # => true .strict? # => false
Each setter and unsetter method returns self,
so the methods may be chained:
.compact.big_lines
### => #<Nokogiri::XML::ParseOptions: ... strict, compact, big_lines>
Constant Summary
-
BIG_LINES =
# File 'lib/nokogiri/xml/parse_options.rb', line 324
Support line numbers up to
long int(default is a ‘short int`). On by default for forDocument,DocumentFragment,::Nokogiri::HTML4::Document,::Nokogiri::HTML4::DocumentFragment,::Nokogiri::XSLT::Stylesheet, andSchema.1 << 22
-
COMPACT =
# File 'lib/nokogiri/xml/parse_options.rb', line 309
Compact small text nodes. Off by default.
⚠ No modification of the DOM tree is allowed after parsing.1 << 16
-
DEFAULT_HTML =
# File 'lib/nokogiri/xml/parse_options.rb', line 336
RECOVER | NOERROR | NOWARNING | NONET | BIG_LINES
-
DEFAULT_SCHEMA =
# File 'lib/nokogiri/xml/parse_options.rb', line 340
NONET | BIG_LINES
-
DEFAULT_XML =
# File 'lib/nokogiri/xml/parse_options.rb', line 328
Shorthand options mask useful for parsing
::Nokogiri::XML: sets RECOVER, NONET, BIG_LINES.RECOVER | NONET | BIG_LINES
-
DEFAULT_XSLT =
# File 'lib/nokogiri/xml/parse_options.rb', line 332
RECOVER | NONET | NOENT | DTDLOAD | DTDATTR | NOCDATA | BIG_LINES
-
DTDATTR =
# File 'lib/nokogiri/xml/parse_options.rb', line 266
Default DTD attributes. On by default for
::Nokogiri::XSLT::Stylesheet.1 << 3
-
DTDLOAD =
# File 'lib/nokogiri/xml/parse_options.rb', line 263
Load external subsets. On by default for
::Nokogiri::XSLT::Stylesheet.⚠ <b>It is UNSAFE to set this option</b> when parsing untrusted documents.1 << 2
-
DTDVALID =
# File 'lib/nokogiri/xml/parse_options.rb', line 269
Validate with the
DTD. Off by default.1 << 4
-
HUGE =
# File 'lib/nokogiri/xml/parse_options.rb', line 319
Relax any hardcoded limit from the parser. Off by default.
⚠ <b>It is UNSAFE to set this option</b> when parsing untrusted documents.1 << 19
-
NOBASEFIX =
# File 'lib/nokogiri/xml/parse_options.rb', line 315
Do not fixup XInclude xml:base URIs. Off by default.
1 << 18
-
NOBLANKS =
# File 'lib/nokogiri/xml/parse_options.rb', line 281
Remove blank nodes. Off by default.
1 << 8
-
NOCDATA =
# File 'lib/nokogiri/xml/parse_options.rb', line 302
Merge CDATA as text nodes. On by default for
::Nokogiri::XSLT::Stylesheet.1 << 14
-
NODICT =
Internal use only
# File 'lib/nokogiri/xml/parse_options.rb', line 296
Do not reuse the context dictionary. Off by default.
1 << 12
-
NOENT =
# File 'lib/nokogiri/xml/parse_options.rb', line 259
Substitute entities. Off by default.
⚠ This option enables entity substitution, contrary to what the name implies. ⚠ <b>It is UNSAFE to set this option</b> when parsing untrusted documents.1 << 1
-
NOERROR =
# File 'lib/nokogiri/xml/parse_options.rb', line 272
Suppress error reports. On by default for
::Nokogiri::HTML4::Documentand::Nokogiri::HTML4::DocumentFragment.1 << 5
-
NONET =
# File 'lib/nokogiri/xml/parse_options.rb', line 293
Forbid network access. On by default for
Document,DocumentFragment,::Nokogiri::HTML4::Document,::Nokogiri::HTML4::DocumentFragment,::Nokogiri::XSLT::Stylesheet, andSchema.⚠ <b>It is UNSAFE to unset this option</b> when parsing untrusted documents.1 << 11
-
NOWARNING =
# File 'lib/nokogiri/xml/parse_options.rb', line 275
Suppress warning reports. On by default for
::Nokogiri::HTML4::Documentand::Nokogiri::HTML4::DocumentFragment.1 << 6
-
NOXINCNODE =
# File 'lib/nokogiri/xml/parse_options.rb', line 305
Do not generate XInclude START/END nodes. Off by default.
1 << 15
-
NSCLEAN =
# File 'lib/nokogiri/xml/parse_options.rb', line 299
Remove redundant namespaces declarations. Off by default.
1 << 13
-
OLD10 =
# File 'lib/nokogiri/xml/parse_options.rb', line 312
Parse using XML-1.0 before update 5. Off by default.
1 << 17
-
PEDANTIC =
# File 'lib/nokogiri/xml/parse_options.rb', line 278
Enable pedantic error reporting. Off by default.
1 << 7
-
RECOVER =
# File 'lib/nokogiri/xml/parse_options.rb', line 254
Recover from errors in input; no strict parsing.
1 << 0
-
SAX1 =
Internal use only
# File 'lib/nokogiri/xml/parse_options.rb', line 284
Use the
SAX1interface internally. Off by default.1 << 9
-
STRICT =
# File 'lib/nokogiri/xml/parse_options.rb', line 251
Strict parsing; do not recover from errors in input.
0 -
XINCLUDE =
# File 'lib/nokogiri/xml/parse_options.rb', line 287
Implement XInclude substitution. Off by default.
1 << 10
Class Method Summary
-
.new(options = ParseOptions::STRICT) ⇒ ParseOptions
constructor
.
Instance Attribute Summary
Instance Method Summary
-
#==(object)
Returns true if the same options are set in
selfandobject. -
#inspect
Returns a string representation of
selfthat includes the numeric value of@options:
Constructor Details
.new(options = ParseOptions::STRICT) ⇒ ParseOptions
Returns a new \ParseOptions object with options as specified by integer argument #options. The value of #options may be constructed using Bitmap Constants.
With the simple constant STRICT (the default), all options are off
(#strict means norecover):
{ParseOptions.new}
#### => #<Nokogiri::XML::ParseOptions: ... strict>
With a different simple constant, one option may be set:
{ParseOptions.new}(ParseOptions::RECOVER)
#### => #<Nokogiri::XML::ParseOptions: ... recover>
{ParseOptions.new}(ParseOptions::COMPACT)
#### => #<Nokogiri::XML::ParseOptions: ... strict, compact>
With multiple ORed constants, multiple options may be set:
= {ParseOptions::COMPACT} | {ParseOptions::RECOVER} | {ParseOptions::BIG_LINES}
{ParseOptions.new}()
#### => #<Nokogiri::XML::ParseOptions: ... recover, compact, big_lines>
# File 'lib/nokogiri/xml/parse_options.rb', line 387
def initialize( = STRICT) @options = end
Instance Attribute Details
#options (rw) Also known as: #to_i
Returns or sets and returns the integer value of self:
= {ParseOptions.new}(ParseOptions::DEFAULT_HTML)
#### => #<Nokogiri::XML::ParseOptions: ... recover, nowarning, nonet, big_...
. # => 4196449
. = {ParseOptions::STRICT}
. # => 0
# File 'lib/nokogiri/xml/parse_options.rb', line 352
attr_accessor :
#strict (readonly)
Turns off option recover:
= {ParseOptions.new}.recover.compact.big_lines
#### => #<Nokogiri::XML::ParseOptions: ... recover, compact, big_lines>
.strict
#### => #<Nokogiri::XML::ParseOptions: ... strict, compact, big_lines>
# File 'lib/nokogiri/xml/parse_options.rb', line 422
def strict @options &= ~RECOVER self end
#strict? ⇒ Boolean (readonly)
#to_i (readonly)
Alias for #options.
# File 'lib/nokogiri/xml/parse_options.rb', line 460
alias_method :to_i, :
Instance Method Details
#==(object)
Returns true if the same options are set in self and object.
= {ParseOptions.new}
#### => #<Nokogiri::XML::ParseOptions: ... strict>
== .dup # => true
== .dup.recover # => false
#inspect
Returns a string representation of self that includes
the numeric value of @options:
= {ParseOptions.new}
.inspect
#### => "#<Nokogiri::XML::ParseOptions: @options=0 strict>"
In general, the returned string also includes the (downcased) names of the options that are on (but omits the names of those that are off):
.recover.big_lines
.inspect
#### => "#<Nokogiri::XML::ParseOptions: @options=4194305 recover, big_lines>"
The exception is that always either recover (i.e, not strict)
or the pseudo-option #strict is reported:
.norecover
.inspect
#### => "#<Nokogiri::XML::ParseOptions: @options=4194304 strict, big_lines>"
# File 'lib/nokogiri/xml/parse_options.rb', line 493
def inspect = [] self.class.constants.each do |k| << k.downcase if send(:"#{k.downcase}?") end super.sub(/>$/, " " + .join(", ") + ">") end