123456789_123456789_123456789_123456789_123456789_

Class: Nokogiri::HTML5::Document

Overview

Since v1.12.0

💡 ::Nokogiri::HTML5 functionality is not available when running JRuby.

Constant Summary

::Nokogiri::XML::PP::Node - Included

COLLECTIONS

::Nokogiri::XML::Searchable - Included

LOOKS_LIKE_XPATH

::Nokogiri::ClassResolver - Included

VALID_NAMESPACES

::Nokogiri::XML::Node - Inherited

ATTRIBUTE_DECL, ATTRIBUTE_NODE, CDATA_SECTION_NODE, COMMENT_NODE, DECONSTRUCT_KEYS, DECONSTRUCT_METHODS, DOCB_DOCUMENT_NODE, DOCUMENT_FRAG_NODE, DOCUMENT_NODE, DOCUMENT_TYPE_NODE, DTD_NODE, ELEMENT_DECL, ELEMENT_NODE, ENTITY_DECL, ENTITY_NODE, ENTITY_REF_NODE, HTML_DOCUMENT_NODE, IMPLIED_XPATH_CONTEXTS, NAMESPACE_DECL, NOTATION_NODE, PI_NODE, TEXT_NODE, XINCLUDE_END, XINCLUDE_START

::Nokogiri::XML::Document - Inherited

IMPLIED_XPATH_CONTEXTS, NCNAME_CHAR, NCNAME_RE, NCNAME_START_CHAR, OBJECT_CLONE_METHOD, OBJECT_DUP_METHOD

Class Method Summary

::Nokogiri::HTML4::Document - Inherited

.new

Create a new empty document with base URI uri and external ID external_id.

.parse

Parse HTML4 input from a String or IO object, and return a new ::Nokogiri::HTML4::Document.

.read_io

Read the ::Nokogiri::HTML document from io with given #url, encoding, and options.

.read_memory

Read the ::Nokogiri::HTML document contained in string with given #url, encoding, and options.

::Nokogiri::XML::Document - Inherited

.new

Alias for XML::Comment.new.

.parse

Parse XML input from a String or IO object, and return a new ::Nokogiri::XML::Document.

.read_io

Create a new document from an IO object.

.read_memory

Create a new document from a String.

.wrap

⚠ This method is only available when running JRuby.

.empty_doc?

::Nokogiri::XML::Node - Inherited

.new

documented in lib/nokogiri/xml/node.rb.

Instance Attribute Summary

::Nokogiri::HTML4::Document - Inherited

#meta_encoding

Get the meta tag encoding for this document.

#meta_encoding=

Set the meta tag encoding for this document.

#title

Get the title string of this document.

#title=

Set the title string of this document.

::Nokogiri::XML::Document - Inherited

#encoding

Get the encoding for this Document.

#encoding=

Set the encoding string for this Document.

#errors

The errors found while parsing a document.

#namespace_inheritance

When true, reparented elements without a namespace will inherit their new parent’s namespace (if one exists).

#root

Get the root node for this document.

#root=

Set the root element on this document.

::Nokogiri::XML::Node - Inherited

#blank?
Returns

true if the node is an empty or whitespace-only text or cdata node, else false.

#cdata?

Returns true if this is a CDATA.

#children

:category: Traversing Document Structure.

#children=

Set the content for this Node node_or_tags

#comment?

Returns true if this is a Comment.

#content

[Returns].

#content=

Set the content of this node to input.

#default_namespace=

Adds a default namespace supplied as a string #url href, to self.

#document

:category: Traversing Document Structure.

#document?

Returns true if this is a Document.

#elem?
#element?

Returns true if this is an Element node.

#fragment?

Returns true if this is a DocumentFragment.

#html?

Returns true if this is an ::Nokogiri::HTML4::Document or Document node.

#inner_html

Get the inner_html for this node’s Node#children

#inner_html=

Set the content for this Node to node_or_tags.

#inner_text
#lang

Searches the language of a node, i.e.

#lang=

Set the language of a node, i.e.

#line
Returns

The line number of this Node.

#line=

Sets the line for this Node.

#name
#namespace
Returns

The Namespace of the element or attribute node, or nil if there is no namespace.

#namespace=

Set the default namespace on this node (as would be defined with an “xmlns=” attribute in ::Nokogiri::XML source), as a Namespace object ns.

#native_content=

Set the content of this node to input.

#next
#next=
#node_name

Returns the name for this Node.

#node_name=

Set the name for this Node.

#parent

Get the parent Node for this Node.

#parent=

Set the parent Node for this Node.

#previous
#previous=
#processing_instruction?

Returns true if this is a ProcessingInstruction node.

#read_only?

Is this a read only node?

#text
#text?

Returns true if this is a Text node.

#to_str
#xml?

Returns true if this is an ::Nokogiri::XML::Document node.

#prepend_newline?, #data_ptr?

Instance Method Summary

::Nokogiri::HTML4::Document - Inherited

#fragment
#serialize

Serialize Node using options.

#type

The type for this document.

#xpath_doctype
Returns

The document type which determines CSS-to-XPath translation.

#meta_content_type, #set_metadata_element

::Nokogiri::XML::Document - Inherited

#<<
#add_child,
#canonicalize

Canonicalize a document and return the results.

#clone

Clone this node.

#collect_namespaces

Recursively get all namespaces from this node and its subtree and return them as a hash.

#create_cdata

Create a CDATA Node containing string

#create_comment

Create a Comment Node containing string

#create_element

Create a new Element with name belonging to this document, optionally setting contents or attributes.

#create_entity

Create a new entity named name.

#create_text_node

Create a Text Node with string

#deconstruct_keys

Returns a hash describing the Document, to use in pattern matching.

#decorate

Apply any decorators to node

#decorators

Get the list of decorators given key

#document

A reference to self

#dup

Duplicate this node.

#fragment

Create a ::Nokogiri::XML::DocumentFragment from tags Returns an empty fragment if tags is nil.

#name

The name of this document.

#namespaces

Get the hash of namespaces on the root ::Nokogiri::XML::Node

#remove_namespaces!

Remove all namespaces from all nodes in the document.

#slop!

Explore a document with shortcut methods.

#to_java

⚠ This method is only available when running JRuby.

#to_xml
#url

Get the url name for this document.

#validate

Validate this Document against its DTD.

#version

Get the ::Nokogiri::XML version for this Document.

#xpath_doctype
Returns

The document type which determines CSS-to-XPath translation.

#inspect_attributes,
#initialize

rubocop:disable Lint/MissingSuper.

::Nokogiri::XML::Node - Inherited

#<<

Add node_or_tags as a child of this Node.

#<=>

Compare two Node objects with respect to their Document.

#==

::Nokogiri::Test to see if this Node is equal to other

#[]

Fetch an attribute from this node.

#[]=

Update the attribute name to value, or create the attribute if it does not exist.

#accept

Accept a visitor.

#add_child

Add node_or_tags as a child of this Node.

#add_class

Ensure HTML ::Nokogiri::CSS classes are present on self.

#add_namespace
#add_namespace_definition

:category: Manipulating Document Structure.

#add_next_sibling

Insert node_or_tags after this Node (as a sibling).

#add_previous_sibling

Insert node_or_tags before this Node (as a sibling).

#after

Insert node_or_tags after this node (as a sibling).

#ancestors

Get a list of ancestor Node for this Node.

#append_class

Add HTML ::Nokogiri::CSS classes to self, regardless of duplication.

#attr

Alias for XML::Node#[].

#attribute

:category: Working With Node Attributes.

#attribute_nodes

:category: Working With Node Attributes.

#attribute_with_ns

:category: Working With Node Attributes.

#attributes

Fetch this node’s attributes.

#before

Insert node_or_tags before this node (as a sibling).

#canonicalize,
#child

:category: Traversing Document Structure.

#classes

Fetch CSS class names of a Node.

#clone

Clone this node.

#create_external_subset

Create an external subset.

#create_internal_subset

Create the internal subset of a document.

#css_path

Get the path to this node as a ::Nokogiri::CSS expression.

#deconstruct_keys

Returns a hash describing the Node, to use in pattern matching.

#decorate!

Decorate this node with the decorators set up in this node’s Document.

#delete
#description

Fetch the ::Nokogiri::HTML4::ElementDescription for this node.

#do_xinclude

Do xinclude substitution on the subtree below node.

#dup

Duplicate this node.

#each

Iterate over each attribute name and value pair for this Node.

#element_children

[Returns].

#elements
#encode_special_chars

Encode any special characters in string

#external_subset

Get the external subset.

#first_element_child
Returns

The first child Node that is an element.

#fragment

Create a DocumentFragment containing tags that is relative to this context node.

#get_attribute

Alias for XML::Node#[].

#has_attribute?

Alias for XML::Node#key?.

#initialize

Create a new node with name that belongs to document.

#internal_subset

Get the internal subset.

#key?

Returns true if attribute is set.

#keys

Get the attribute names for this Node.

#kwattr_add

Ensure that values are present in a keyword attribute.

#kwattr_append

Add keywords to a Node’s keyword attribute, regardless of duplication.

#kwattr_remove

Remove keywords from a keyword attribute.

#kwattr_values

Fetch values from a keyword attribute of a Node.

#last_element_child
Returns

The last child Node that is an element.

#matches?

Returns true if this Node matches selector

#namespace_definitions

[Returns].

#namespace_scopes
Returns

Array of all the Namespaces on this node and its ancestors.

#namespaced_key?

Returns true if attribute is set with namespace

#namespaces

Fetch all the namespaces on this node and its ancestors.

#next_element

Returns the next ::Nokogiri::XML::Element type sibling node.

#next_sibling

Returns the next sibling node.

#node_type

Get the type for this Node.

#parse

Parse string_or_io as a document fragment within the context of this node.

#path

Returns the path associated with this Node.

#pointer_id

[Returns].

#prepend_child

Add node_or_tags as the first child of this Node.

#previous_element

Returns the previous ::Nokogiri::XML::Element type sibling node.

#previous_sibling

Returns the previous sibling node.

#remove

Alias for XML::Node#unlink.

#remove_attribute

Remove the attribute named name

#remove_class

Remove HTML ::Nokogiri::CSS classes from this node.

#replace

Replace this Node with node_or_tags.

#serialize

Serialize Node using options.

#set_attribute

Alias for XML::Node#[]=.

#swap

Swap this Node for node_or_tags

#to_html

Serialize this Node to ::Nokogiri::HTML.

#to_s

Turn this node in to a string.

#to_xhtml

Serialize this Node to XHTML using options

#to_xml

Serialize this Node to ::Nokogiri::XML using options

#traverse

Yields self and all children to block recursively.

#type
#unlink

Unlink this node from its current context.

#value?

Does this Node’s attributes include <value>.

#values

Get the attribute values for this Node.

#wrap

Wrap this Node with the node parsed from markup or a dup of the node.

#write_html_to

Write Node as ::Nokogiri::HTML to io with options

#write_to

Serialize this node or document to io.

#write_xhtml_to

Write Node as XHTML to io with options

#write_xml_to

Write Node as ::Nokogiri::XML to io with options

#add_child_node_and_reparent_attrs, #add_sibling,
#compare

Compare this Node to other with respect to their Document.

#dump_html

Returns the Node as html.

#get

Get the value for attribute

#html_standard_serialize,
#in_context

TODO: DOCUMENT ME.

#inspect_attributes, #keywordify,
#native_write_to

Write this Node to io with encoding and options

#process_xincludes

Loads and substitutes all xinclude elements below the node.

#set

Set the property to value

#set_namespace

Set the namespace to namespace

#to_format, #write_format_to, #add_child_node, #add_next_sibling_node, #add_previous_sibling_node, #replace_node

::Nokogiri::ClassResolver - Included

#related_class

Find a class constant within the.

::Nokogiri::XML::Searchable - Included

#%
#/
#>

Search this node’s immediate children using ::Nokogiri::CSS selector selector

#at

Search this object for paths, and return only the first result.

#at_css

Search this object for ::Nokogiri::CSS rules, and return only the first match.

#at_xpath

Search this node for XPath paths, and return only the first match.

#css

Search this object for ::Nokogiri::CSS rules.

#search

Search this object for paths.

#xpath

Search this node for XPath paths.

#css_internal, #css_rules_to_xpath, #xpath_impl, #xpath_internal, #xpath_query_from_css_rule, #extract_params

::Nokogiri::XML::PP::Node - Included

Constructor Details

.new(*args) ⇒ Document

This method is for internal use only.
[ GitHub ]

  
# File 'lib/nokogiri/html5/document.rb', line 159

def initialize(*args) # :nodoc:
  super
  @url = nil
  @quirks_mode = nil
end

Class Method Details

.do_parse(string_or_io, url, encoding, **options) (private)

[ GitHub ]

  
# File 'lib/nokogiri/html5/document.rb', line 146

def do_parse(string_or_io, url, encoding, **options)
  string = HTML5.read_and_encode(string_or_io, encoding)

  options[:max_attributes] ||= Nokogiri::Gumbo::DEFAULT_MAX_ATTRIBUTES
  options[:max_errors] ||= options.delete(:max_parse_errors) || Nokogiri::Gumbo::DEFAULT_MAX_ERRORS
  options[:max_tree_depth] ||= Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH

  doc = Nokogiri::Gumbo.parse(string, url, self, **options)
  doc.encoding = "UTF-8"
  doc
end

.parse(input) { |options| ... } → HTML5::Document) .parse(input, url: encoding:) { |options| ... } → HTML5::Document) .parse(input, **options) → HTML5::Document)

Parse HTML input with a parser compliant with the ::Nokogiri::HTML5 spec. This method uses the encoding of input if it can be determined, or else falls back to the encoding: parameter.

Required Parameters
  • input (String | IO) the HTML content to be parsed.

Optional Parameters
  • url: (String) the base URI of the document.

Optional Keyword Arguments
  • encoding: (Encoding) The name of the encoding that should be used when processing the document. When not provided, the encoding will be determined based on the document content.

  • max_errors: (Integer) The maximum number of parse errors to record. (default Gumbo::DEFAULT_MAX_ERRORS which is currently 0)

  • max_tree_depth: (Integer) The maximum depth of the parse tree. (default Gumbo::DEFAULT_MAX_TREE_DEPTH)

  • max_attributes: (Integer) The maximum number of attributes allowed on an element. (default Gumbo::DEFAULT_MAX_ATTRIBUTES)

  • parse_noscript_content_as_text: (Boolean) Whether to parse the content of noscript elements as text. (default false)

See HTML5@Parsing+options for a complete description of these parsing options.

Yields

If present, the block will be passed a Hash object to modify with parse options before the input is parsed. See HTML5@Parsing+options for a list of available options.

⚠ Note that url: and encoding: cannot be set by the configuration block.

Returns

Document

Example: Parse a string with a specific encoding and custom max errors limit.

Nokogiri::HTML5::Document.parse(socket, encoding: "ISO-8859-1", max_errors: 10)

Example: Parse a string setting the :parse_noscript_content_as_text option using the configuration block parameter.

Nokogiri::HTML5::Document.parse(input) { |c| c[:parse_noscript_content_as_text] = true }

Yields:

  • (options)
[ GitHub ]

  
# File 'lib/nokogiri/html5/document.rb', line 103

def parse(
  string_or_io,
  url_ = nil, encoding_ = nil,
  url: url_, encoding: encoding_,
  **options, &block
)
  yield options if block
  string_or_io = "" unless string_or_io

  if string_or_io.respond_to?(:encoding) && string_or_io.encoding != Encoding::ASCII_8BIT
    encoding ||= string_or_io.encoding.name
  end

  if string_or_io.respond_to?(:read) && string_or_io.respond_to?(:path)
    url ||= string_or_io.path
  end
  unless string_or_io.respond_to?(:read) || string_or_io.respond_to?(:to_str)
    raise ArgumentError, "not a string or IO object"
  end

  do_parse(string_or_io, url, encoding, **options)
end

.read_io(io, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options)

Create a new document from an IO object.

💡 Most users should prefer .parse to this method.

Raises:

  • (ArgumentError)
[ GitHub ]

  
# File 'lib/nokogiri/html5/document.rb', line 129

def read_io(io, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options)
  raise ArgumentError, "io object doesn't respond to :read" unless io.respond_to?(:read)

  do_parse(io, url, encoding, **options)
end

.read_memory(string, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options)

Create a new document from a String.

💡 Most users should prefer .parse to this method.

Raises:

  • (ArgumentError)
[ GitHub ]

  
# File 'lib/nokogiri/html5/document.rb', line 138

def read_memory(string, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options)
  raise ArgumentError, "string object doesn't respond to :to_str" unless string.respond_to?(:to_str)

  do_parse(string, url, encoding, **options)
end

Instance Attribute Details

#quirks_mode (readonly)

Get the parser’s quirks mode value. See QuirksMode.

This method returns nil if the parser was not invoked (e.g., .new).

Since v1.14.0

[ GitHub ]

  
# File 'lib/nokogiri/html5/document.rb', line 49

attr_reader :quirks_mode

#url (readonly)

Get the url name for this document, as passed into .parse, .read_io, or .read_memory

[ GitHub ]

  
# File 'lib/nokogiri/html5/document.rb', line 42

attr_reader :url

Instance Method Details

#fragment() → Nokogiri::HTML5::DocumentFragment) #fragment(markup) → Nokogiri::HTML5::DocumentFragment)

Parse a ::Nokogiri::HTML5 document fragment from markup, returning a DocumentFragment.

Properties
  • markup (String) The HTML5 markup fragment to be parsed

Returns

Nokogiri::HTML5::DocumentFragment. This object’s children will be empty if markup is not passed, is empty, or is nil.

[ GitHub ]

  
# File 'lib/nokogiri/html5/document.rb', line 178

def fragment(markup = nil)
  DocumentFragment.new(self, markup)
end

#to_xml(options = {}, &block)

This method is for internal use only.
[ GitHub ]

  
# File 'lib/nokogiri/html5/document.rb', line 182

def to_xml(options = {}, &block) # :nodoc:
  # Bypass XML::Document#to_xml which doesn't add
  # XML::Node::SaveOptions::AS_XML like XML::Node#to_xml does.
  XML::Node.instance_method(:to_xml).bind_call(self, options, &block)
end

#xpath_doctype() → Nokogiri::CSS::XPathVisitor::DoctypeConfig)

Returns

The document type which determines CSS-to-XPath translation.

See ::Nokogiri::CSS::XPathVisitor for more information.

[ GitHub ]

  
# File 'lib/nokogiri/html5/document.rb', line 194

def xpath_doctype
  Nokogiri::CSS::XPathVisitor::DoctypeConfig::HTML5
end