123456789_123456789_123456789_123456789_123456789_

Class: Nokogiri::XML::ParseOptions

Relationships & Source Files
Inherits: Object
Defined in: lib/nokogiri/xml/parse_options.rb

Overview

Class to contain options for parsing XML or HTML4 (but not HTML5).

πŸ’‘ Note that HTML5 parsing has a separate, orthogonal set of options due to the API of the HTML5 library used. See ::Nokogiri::HTML5.

About the Examples

Examples on this page assume that the following code has been executed:

β€œβ€˜ require ’nokogiri’ # Make Nokogiri available. include Nokogiri # Allow omitting leading β€˜Nokogiri::’. xml_s = β€œ<root />n” # String containing XML. File.write(β€˜t.xml’, xml_s) # File containing XML. html_s = β€œ<html />n” # String containing HTML. File.write(β€˜t.html’, html_s) # File containing HTML. β€œβ€˜

Examples executed via IRB (interactive Ruby) display ParseOptions instances using method #inspect.

Parsing Methods

Each of the parsing methods performs parsing for an XML or HTML4 source:

  • Each requires a leading argument that specifies the source of the text to be parsed; except as noted, the argument’s value may be either:

    - A string.
    - An open IO stream (must respond to methods {read} and {close}).
    
    Examples:
    
    ```
    XML::parse(xml_s)
    HTML4.parse(html_s)
    XML::parse(File.open('t.xml'))
    HTML4.parse(File.open('t.html'))
    ```
  • Each accepts a trailing optional argument #options (or keyword argument #options) that specifies parsing options; the argument’s value may be either:

    - An integer: see [Bitmap Constants](rdoc-ref:ParseOptions@Bitmap+Constants).
    - An instance of \ParseOptions: see ParseOptions.new.
    
    Examples:
    
    ```
    XML::parse(xml_s, options: XML::ParseOptions::STRICT)
    HTML4::parse(html_s, options: XML::ParseOptions::BIG_LINES)
    XML::parse(xml_s, options: XML::ParseOptions.new.strict)
    HTML4::parse(html_s, options: XML::ParseOptions.new.big_lines)
    ```
  • Each (except as noted) accepts a block that allows parsing options to be specified; see [Options-Setting Blocks](ParseOptions@Options-Setting+Blocks).

Certain other parsing methods use different options; see HTML5.

⚠ Not all parse options are supported on JRuby. Nokogiri attempts to invoke the equivalent behavior in Xerces/NekoHTML on JRuby when it’s possible.

Bitmap Constants

Each of the [parsing methods](ParseOptions@Parsing+Methods) discussed here accept an integer argument #options that specifies parsing options.

That integer value may be constructed using the bitmap constants defined in ParseOptions.

Except for STRICT (see note below), each of the bitmap constants has a non-zero value that represents a bit in an integer value; to illustrate, here are a few of the constants, displayed in binary format (base 2):

β€œβ€˜ RECOVER.to_s(2) # => β€œ1” NOENT.to_s(2) # => β€œ10” DTDLOAD.to_s(2) # => β€œ100” DTDATTR.to_s(2) # => β€œ1000” DTDVALID.to_s(2) # => β€œ10000” β€œ`

Any of these constants may be used alone to specify a single option:

β€œβ€˜ .new(ParseOptions::DTDLOAD)

=> #<Nokogiri::XML::ParseOptions: … strict, dtdload>

.new(ParseOptions::DTDATTR)

=> #<Nokogiri::XML::ParseOptions: … strict, dtdattr>

β€œβ€˜

Multiple constants may be ORed together to specify multiple options:

β€œβ€˜ options = BIG_LINES | COMPACT | NOCDATA .new(options)

=> #<Nokogiri::XML::ParseOptions: … strict, nocdata, compact, big_lines>

β€œβ€˜

Note: The value of constant STRICT is zero; it may be used alone to turn all options off:

β€œβ€˜ parse(’<root />β€˜) {|options| puts options.inspect } #<Nokogiri::XML::ParseOptions: recover, nonet, big_lines, default_schema, default_xml> parse(’<root />β€˜, nil, nil, STRICT) {|options| puts options.inspect } #<Nokogiri::XML::ParseOptions: strict> β€œ`

The single-option bitmask constants are: BIG_LINES, COMPACT, DTDATTR, DTDLOAD, DTDVALID, HUGE, NOBASEFIX, NOBLANKS, NOCDATA, NODICT, NOENT, NOERROR, NONET, NOWARNING, NOXINCNODE, NSCLEAN, OLD10, PEDANTIC, RECOVER, SAX1, STRICT, XINCLUDE.

There are also several β€œshorthand” constants that can set multiple options: DEFAULT_HTML, DEFAULT_SCHEMA, DEFAULT_XML, DEFAULT_XSLT.

Examples:

β€œβ€˜ .new(ParseOptions::DEFAULT_HTML)

=> #<Nokogiri::XML::ParseOptions: … recover, nowarning, nonet, big_lines, default_schema, noerror, default_html, default_xml>

.new(ParseOptions::DEFAULT_SCHEMA)

=> #<Nokogiri::XML::ParseOptions: … strict, nonet, big_lines, default_schema>

.new(ParseOptions::DEFAULT_XML)

=> #<Nokogiri::XML::ParseOptions: … recover, nonet, big_lines, default_schema, default_xml>

.new(ParseOptions::DEFAULT_XSLT)

=> #<Nokogiri::XML::ParseOptions: … recover, noent, dtdload, dtdattr, nonet, nocdata, big_lines, default_xslt, default_schema, default_xml> #

β€œβ€˜

Nokogiri itself uses these shorthand constants for its parsing, and they are generally most suitable for Nokogiri users’ code.

Options-Setting Blocks

Many of the [parsing methods](ParseOptions@Parsing+Methods) discussed here accept an options-setting block.

The block is called with a new instance of ParseOptions created with the defaults for the specific method:

β€œβ€˜ parse(xml_s) {|options| puts options.inspect } #<Nokogiri::XML::ParseOptions: @options=4196353 recover, nonet, big_lines, default_xml, default_schema> HTML4.parse(html_s) {|options| puts options.inspect } #<Nokogiri::XML::ParseOptions: @options=4196449 recover, nowarning, nonet, big_lines, default_html, default_xml, noerror, default_schema> β€œ`

When the block returns, the parsing is performed using those #options.

The block may modify those options, which affects parsing:

β€œβ€˜ bad_xml = ’<root>β€˜ # End tag missing. parse(bad_xml) # No error because option RECOVER is on. parse(bad_xml) {|options| options.strict } # Raises SyntaxError because option STRICT is on. β€œ`

Convenience Methods

A ParseOptions object has three sets of convenience methods, each based on the name of one of the constants:

  • Setters: each is the downcase of an option name, and turns on an option:

    ```
    options = ParseOptions.new
    # => #<Nokogiri::XML::ParseOptions: ... strict>
    options.big_lines
    # => #<Nokogiri::XML::ParseOptions: ... strict, big_lines>
    options.compact
    # => #<Nokogiri::XML::ParseOptions: ... strict, compact, big_lines>
    ```
  • Unsetters: each begins with no, and turns off an option.

    Note that there is no unsetter {nostrict},
    but the setter {recover} serves the same purpose:
    
    ```
    options.nobig_lines
    # => #<Nokogiri::XML::ParseOptions: ... strict, compact>
    options.nocompact
    # => #<Nokogiri::XML::ParseOptions: ... strict>
    options.recover # Functionally equivalent to nostrict.
    # => #<Nokogiri::XML::ParseOptions: ... recover>
    options.noent   # Set NOENT.
    # => #<Nokogiri::XML::ParseOptions: ... recover, noent>
    options.nonoent # Unset NOENT.
    # => #<Nokogiri::XML::ParseOptions: ... recover>
    ```
    
    πŸ’‘ Note that some options begin with {no}, leading to the logical but perhaps unintuitive
    double negative:
    
    ```
    po.nocdata # Set the NOCDATA parse option
    po.nonocdata # Unset the NOCDATA parse option
    ```
  • Queries: each ends with ?, and returns whether an option is on or off:

    ```
    options.recover? # => true
    options.strict?  # => false
    ```

Each setter and unsetter method returns self, so the methods may be chained:

β€œβ€˜ options.compact.big_lines

=> #<Nokogiri::XML::ParseOptions: … strict, compact, big_lines>

β€œβ€˜

Constant Summary

Class Method Summary

Instance Attribute Summary

Instance Method Summary

  • #==(object)

    Returns true if the same options are set in self and object.

  • #inspect

    Returns a string representation of self that includes the numeric value of @options:

Constructor Details

.new(options = ParseOptions::STRICT) ⇒ ParseOptions

:markup: markdown

Returns a new ParseOptions object with options as specified by integer argument #options. The value of #options may be constructed using [Bitmap Constants](ParseOptions@Bitmap+Constants).

With the simple constant STRICT (the default), all options are off (#strict means norecover):

β€œβ€˜ .new

=> #<Nokogiri::XML::ParseOptions: … strict>

β€œβ€˜

With a different simple constant, one option may be set:

β€œβ€˜ .new(ParseOptions::RECOVER)

=> #<Nokogiri::XML::ParseOptions: … recover>

.new(ParseOptions::COMPACT)

=> #<Nokogiri::XML::ParseOptions: … strict, compact>

β€œβ€˜

With multiple ORed constants, multiple options may be set:

β€œβ€˜ options = COMPACT | RECOVER | BIG_LINES .new(options)

=> #<Nokogiri::XML::ParseOptions: … recover, compact, big_lines>

β€œβ€˜

[ GitHub ]

  
# File 'lib/nokogiri/xml/parse_options.rb', line 389

def initialize(options = STRICT)
  @options = options
end

Instance Attribute Details

#options (rw) Also known as: #to_i

Returns or sets and returns the integer value of self:

β€œβ€˜ options = .new(ParseOptions::DEFAULT_HTML)

=> #<Nokogiri::XML::ParseOptions: … recover, nowarning, nonet, big_…

options.options # => 4196449 options.options = STRICT options.options # => 0 β€œβ€˜

[ GitHub ]

  
# File 'lib/nokogiri/xml/parse_options.rb', line 354

attr_accessor :options

#strict (readonly)

Turns off option recover:

β€œβ€˜ options = .new.recover.compact.big_lines

=> #<Nokogiri::XML::ParseOptions: … recover, compact, big_lines>

options.strict

=> #<Nokogiri::XML::ParseOptions: … strict, compact, big_lines>

β€œβ€˜

[ GitHub ]

  
# File 'lib/nokogiri/xml/parse_options.rb', line 424

def strict
  @options &= ~RECOVER
  self
end

#strict?Boolean (readonly)

Returns whether option #strict is on:

β€œβ€˜ options = .new.recover.compact.big_lines

=> #<Nokogiri::XML::ParseOptions: … recover, compact, big_lines>

options.strict? # => false options.strict

=> #<Nokogiri::XML::ParseOptions: … strict, compact, big_lines>

options.strict? # => true β€œβ€˜

[ GitHub ]

  
# File 'lib/nokogiri/xml/parse_options.rb', line 442

def strict?
  @options & RECOVER == STRICT
end

#to_i (readonly)

Alias for #options.

[ GitHub ]

  
# File 'lib/nokogiri/xml/parse_options.rb', line 462

alias_method :to_i, :options

Instance Method Details

#==(object)

Returns true if the same options are set in self and object.

β€œβ€˜ options = .new

=> #<Nokogiri::XML::ParseOptions: … strict>

options == options.dup # => true options == options.dup.recover # => false β€œβ€˜

[ GitHub ]

  
# File 'lib/nokogiri/xml/parse_options.rb', line 458

def ==(other)
  other.to_i == to_i
end

#inspect

Returns a string representation of self that includes the numeric value of @options:

β€œβ€˜ options = .new options.inspect

=> β€œ#<Nokogiri::XML::ParseOptions: @options=0 strict>”

β€œβ€˜

In general, the returned string also includes the (downcased) names of the options that are on (but omits the names of those that are off):

β€œβ€˜ options.recover.big_lines options.inspect

=> β€œ#<Nokogiri::XML::ParseOptions: @options=4194305 recover, big_lines>”

β€œβ€˜

The exception is that always either recover (i.e, *not strict*) or the pseudo-option #strict is reported:

β€œβ€˜ options.norecover options.inspect

=> β€œ#<Nokogiri::XML::ParseOptions: @options=4194304 strict, big_lines>”

β€œβ€˜

[ GitHub ]

  
# File 'lib/nokogiri/xml/parse_options.rb', line 495

def inspect
  options = []
  self.class.constants.each do |k|
    options << k.downcase if send(:"#{k.downcase}?")
  end
  super.sub(/>$/, " " + options.join(", ") + ">")
end