123456789_123456789_123456789_123456789_123456789_

Class: Nokogiri::XML::ParseOptions

Relationships & Source Files
Inherits: Object
Defined in: lib/nokogiri/xml/parse_options.rb

Overview

\Class to contain options for parsing \XML or \HTML4 (but not \HTML5).

💡 Note that \HTML5 parsing has a separate, orthogonal set of options due to the API of the \HTML5 library used. See ::Nokogiri::HTML5.

About the Examples

Examples on this page assume that the following code has been executed:

require 'nokogiri'           # Make Nokogiri available.
include Nokogiri             # Allow omitting leading 'Nokogiri::'.
xml_s = "<root />\n"         # String containing XML.
{File.write}('t.xml', xml_s)   # File containing XML.
html_s = "<html />\n"        # String containing HTML.
{File.write}('t.html', html_s) # File containing HTML.

Examples executed via IRB (interactive Ruby) display \ParseOptions instances using method #inspect.

Parsing Methods

Each of the parsing methods performs parsing for an \XML or \HTML4 source:

  • Each requires a leading argument that specifies the source of the text to be parsed; except as noted, the argument's value may be either:

    • A string.
    • An open IO stream (must respond to methods read and close).

    Examples:

    XML::parse(xml_s)
    HTML4.parse(html_s)
    XML::parse(File.open('t.xml'))
    HTML4.parse(File.open('t.html'))
  • Each accepts a trailing optional argument #options (or keyword argument #options) that specifies parsing options; the argument's value may be either:

    • An integer: see Bitmap Constants.
    • An instance of \ParseOptions: see ParseOptions.new.

    Examples:

    XML::parse(xml_s, options: XML::ParseOptions::STRICT)
    HTML4::parse(html_s, options: XML::ParseOptions::BIG_LINES)
    XML::parse(xml_s, options: XML::ParseOptions.new.strict)
    HTML4::parse(html_s, options: XML::ParseOptions.new.big_lines)
  • Each (except as noted) accepts a block that allows parsing options to be specified; see Options-Setting Blocks.

Certain other parsing methods use different options; see \HTML5.

⚠ Not all parse options are supported on JRuby. \Nokogiri attempts to invoke the equivalent behavior in Xerces/NekoHTML on JRuby when it's possible.

Bitmap Constants

Each of the parsing methods discussed here accept an integer argument #options that specifies parsing options.

That integer value may be constructed using the bitmap constants defined in \ParseOptions.

Except for STRICT (see note below), each of the bitmap constants has a non-zero value that represents a bit in an integer value; to illustrate, here are a few of the constants, displayed in binary format (base 2):

{ParseOptions::RECOVER}.to_s(2)  # => "1"
{ParseOptions::NOENT}.to_s(2)    # => "10"
{ParseOptions::DTDLOAD}.to_s(2)  # => "100"
{ParseOptions::DTDATTR}.to_s(2)  # => "1000"
{ParseOptions::DTDVALID}.to_s(2) # => "10000"

Any of these constants may be used alone to specify a single option:

{ParseOptions.new}(ParseOptions::DTDLOAD)
### => #<Nokogiri::XML::ParseOptions: ... strict, dtdload>
{ParseOptions.new}(ParseOptions::DTDATTR)
### => #<Nokogiri::XML::ParseOptions: ... strict, dtdattr>

Multiple constants may be ORed together to specify multiple options:

options = {ParseOptions::BIG_LINES} | {ParseOptions::COMPACT} | {ParseOptions::NOCDATA}
{ParseOptions.new}(options)
### => #<Nokogiri::XML::ParseOptions: ... strict, nocdata, compact, big_lines>

Note: The value of constant STRICT is zero; it may be used alone to turn all options off:

{XML.parse}('<root />') {|options| puts options.inspect }
#<Nokogiri::XML::ParseOptions: recover, nonet, big_lines, default_schema, default_xml>
{XML.parse}('<root />', nil, nil, {ParseOptions::STRICT}) {|options| puts options.inspect }
#<Nokogiri::XML::ParseOptions: strict>

The single-option bitmask constants are: BIG_LINES, COMPACT, DTDATTR, DTDLOAD, DTDVALID, HUGE, NOBASEFIX, NOBLANKS, NOCDATA, NOENT, NOERROR, NONET, NOWARNING, NOXINCNODE, NSCLEAN, OLD10, PEDANTIC, RECOVER, STRICT, XINCLUDE.

There are also several "shorthand" constants that can set multiple options: DEFAULT_HTML, DEFAULT_SCHEMA, DEFAULT_XML, DEFAULT_XSLT.

Examples:

{ParseOptions.new}(ParseOptions::DEFAULT_HTML)
### => #<Nokogiri::XML::ParseOptions: ... recover, nowarning, nonet, big_lines, default_schema, noerror, default_html, default_xml>
{ParseOptions.new}(ParseOptions::DEFAULT_SCHEMA)
### => #<Nokogiri::XML::ParseOptions: ... strict, nonet, big_lines, default_schema>
{ParseOptions.new}(ParseOptions::DEFAULT_XML)
### => #<Nokogiri::XML::ParseOptions: ... recover, nonet, big_lines, default_schema, default_xml>
{ParseOptions.new}(ParseOptions::DEFAULT_XSLT)
### => #<Nokogiri::XML::ParseOptions: ... recover, noent, dtdload, dtdattr, nonet, nocdata, big_lines, default_xslt, default_schema, default_xml>    #

\Nokogiri itself uses these shorthand constants for its parsing, and they are generally most suitable for \Nokogiri users' code.

Options-Setting Blocks

Many of the parsing methods discussed here accept an options-setting block.

The block is called with a new instance of \ParseOptions created with the defaults for the specific method:

{XML.parse}(xml_s) {|options| puts options.inspect }
#<Nokogiri::XML::ParseOptions: @options=4196353 recover, nonet, big_lines, default_xml, default_schema>
{HTML4.parse}(html_s) {|options| puts options.inspect }
#<Nokogiri::XML::ParseOptions: @options=4196449 recover, nowarning, nonet, big_lines, default_html, default_xml, noerror, default_schema>

When the block returns, the parsing is performed using those #options.

The block may modify those options, which affects parsing:

bad_xml = '<root>'                              # End tag missing.
{XML.parse}(bad_xml)                             # No error because option RECOVER is on.
{XML.parse}(bad_xml) {|options| options.strict } # Raises SyntaxError because option STRICT is on.

Convenience Methods

A \ParseOptions object has three sets of convenience methods, each based on the name of one of the constants:

  • Setters: each is the downcase of an option name, and turns on an option:

    options = ParseOptions.new
    # => #<Nokogiri::XML::ParseOptions: ... strict>
    options.big_lines
    # => #<Nokogiri::XML::ParseOptions: ... strict, big_lines>
    options.compact
    # => #<Nokogiri::XML::ParseOptions: ... strict, compact, big_lines>
  • Unsetters: each begins with no, and turns off an option.

    Note that there is no unsetter nostrict, but the setter recover serves the same purpose:

    options.nobig_lines
    # => #<Nokogiri::XML::ParseOptions: ... strict, compact>
    options.nocompact
    # => #<Nokogiri::XML::ParseOptions: ... strict>
    options.recover # Functionally equivalent to nostrict.
    # => #<Nokogiri::XML::ParseOptions: ... recover>
    options.noent   # Set NOENT.
    # => #<Nokogiri::XML::ParseOptions: ... recover, noent>
    options.nonoent # Unset NOENT.
    # => #<Nokogiri::XML::ParseOptions: ... recover>

    💡 Note that some options begin with no, leading to the logical but perhaps unintuitive double negative:

    po.nocdata # Set the NOCDATA parse option
    po.nonocdata # Unset the NOCDATA parse option
  • Queries: each ends with ?, and returns whether an option is on or off:

    options.recover? # => true
    options.strict?  # => false

Each setter and unsetter method returns self, so the methods may be chained:

options.compact.big_lines
### => #<Nokogiri::XML::ParseOptions: ... strict, compact, big_lines>

Constant Summary

Class Method Summary

Instance Attribute Summary

Instance Method Summary

  • #==(object)

    Returns true if the same options are set in self and object.

  • #inspect

    Returns a string representation of self that includes the numeric value of @options:

Constructor Details

.new(options = ParseOptions::STRICT) ⇒ ParseOptions

Returns a new \ParseOptions object with options as specified by integer argument #options. The value of #options may be constructed using Bitmap Constants.

With the simple constant STRICT (the default), all options are off (#strict means norecover):

{ParseOptions.new}
#### => #<Nokogiri::XML::ParseOptions: ... strict>

With a different simple constant, one option may be set:

{ParseOptions.new}(ParseOptions::RECOVER)
#### => #<Nokogiri::XML::ParseOptions: ... recover>
{ParseOptions.new}(ParseOptions::COMPACT)
#### => #<Nokogiri::XML::ParseOptions:  ... strict, compact>

With multiple ORed constants, multiple options may be set:

options = {ParseOptions::COMPACT} | {ParseOptions::RECOVER} | {ParseOptions::BIG_LINES}
{ParseOptions.new}(options)
#### => #<Nokogiri::XML::ParseOptions: ... recover, compact, big_lines>
[ GitHub ]

  
# File 'lib/nokogiri/xml/parse_options.rb', line 387

def initialize(options = STRICT)
  @options = options
end

Instance Attribute Details

#options (rw) Also known as: #to_i

Returns or sets and returns the integer value of self:

options = {ParseOptions.new}(ParseOptions::DEFAULT_HTML)
#### => #<Nokogiri::XML::ParseOptions: ... recover, nowarning, nonet, big_...
options.options # => 4196449
options.options = {ParseOptions::STRICT}
options.options # => 0
[ GitHub ]

  
# File 'lib/nokogiri/xml/parse_options.rb', line 352

attr_accessor :options

#strict (readonly)

Turns off option recover:

options = {ParseOptions.new}.recover.compact.big_lines
#### => #<Nokogiri::XML::ParseOptions: ... recover, compact, big_lines>
options.strict
#### => #<Nokogiri::XML::ParseOptions: ... strict, compact, big_lines>
[ GitHub ]

  
# File 'lib/nokogiri/xml/parse_options.rb', line 422

def strict
  @options &= ~RECOVER
  self
end

#strict?Boolean (readonly)

Returns whether option #strict is on:

options = {ParseOptions.new}.recover.compact.big_lines
#### => #<Nokogiri::XML::ParseOptions: ... recover, compact, big_lines>
options.strict? # => false
options.strict
#### => #<Nokogiri::XML::ParseOptions: ... strict, compact, big_lines>
options.strict? # => true
[ GitHub ]

  
# File 'lib/nokogiri/xml/parse_options.rb', line 440

def strict?
  @options & RECOVER == STRICT
end

#to_i (readonly)

Alias for #options.

[ GitHub ]

  
# File 'lib/nokogiri/xml/parse_options.rb', line 460

alias_method :to_i, :options

Instance Method Details

#==(object)

Returns true if the same options are set in self and object.

options = {ParseOptions.new}
#### => #<Nokogiri::XML::ParseOptions: ... strict>
options == options.dup         # => true
options == options.dup.recover # => false
[ GitHub ]

  
# File 'lib/nokogiri/xml/parse_options.rb', line 456

def ==(other)
  other.to_i == to_i
end

#inspect

Returns a string representation of self that includes the numeric value of @options:

options = {ParseOptions.new}
options.inspect
#### => "#<Nokogiri::XML::ParseOptions: @options=0 strict>"

In general, the returned string also includes the (downcased) names of the options that are on (but omits the names of those that are off):

options.recover.big_lines
options.inspect
#### => "#<Nokogiri::XML::ParseOptions: @options=4194305 recover, big_lines>"

The exception is that always either recover (i.e, not strict) or the pseudo-option #strict is reported:

options.norecover
options.inspect
#### => "#<Nokogiri::XML::ParseOptions: @options=4194304 strict, big_lines>"
[ GitHub ]

  
# File 'lib/nokogiri/xml/parse_options.rb', line 493

def inspect
  options = []
  self.class.constants.each do |k|
    options << k.downcase if send(:"#{k.downcase}?")
  end
  super.sub(/>$/, " " + options.join(", ") + ">")
end