Class: Nokogiri::HTML5::Document
| Relationships & Source Files | |
| Super Chains via Extension / Inclusion / Inheritance | |
|
Class Chain:
|
|
|
Instance Chain:
|
|
| Inherits: |
Nokogiri::HTML4::Document
|
| Defined in: | lib/nokogiri/html5/document.rb |
Overview
Since v1.12.0
💡 ::Nokogiri::HTML5 functionality is not available when running JRuby.
Constant Summary
::Nokogiri::XML::PP::Node - Included
::Nokogiri::XML::Searchable - Included
::Nokogiri::ClassResolver - Included
::Nokogiri::XML::Node - Inherited
ATTRIBUTE_DECL, ATTRIBUTE_NODE, CDATA_SECTION_NODE, COMMENT_NODE, DECONSTRUCT_KEYS, DECONSTRUCT_METHODS, DOCB_DOCUMENT_NODE, DOCUMENT_FRAG_NODE, DOCUMENT_NODE, DOCUMENT_TYPE_NODE, DTD_NODE, ELEMENT_DECL, ELEMENT_NODE, ENTITY_DECL, ENTITY_NODE, ENTITY_REF_NODE, HTML_DOCUMENT_NODE, IMPLIED_XPATH_CONTEXTS, NAMESPACE_DECL, NOTATION_NODE, PI_NODE, TEXT_NODE, XINCLUDE_END, XINCLUDE_START
::Nokogiri::XML::Document - Inherited
IMPLIED_XPATH_CONTEXTS, NCNAME_CHAR, NCNAME_RE, NCNAME_START_CHAR, OBJECT_CLONE_METHOD, OBJECT_DUP_METHOD
Class Method Summary
-
.parse(input) { |options| ... } → HTML5::Document)
Parse HTML input with a parser compliant with the
::Nokogiri::HTML5spec. -
.read_io(io, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options)
Create a new document from an IO object.
-
.read_memory(string, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options)
Create a new document from a String.
- .do_parse(string_or_io, url, encoding, **options) private
- .new(*args) ⇒ Document constructor Internal use only
::Nokogiri::HTML4::Document - Inherited
| .new | Create a new empty document with base URI |
| .parse | Parse HTML4 input from a String or IO object, and return a new |
| .read_io | Read the |
| .read_memory | Read the |
::Nokogiri::XML::Document - Inherited
| .new | Alias for XML::Comment.new. |
| .parse | Parse XML input from a String or IO object, and return a new |
| .read_io | Create a new document from an IO object. |
| .read_memory | Create a new document from a String. |
| .wrap | ⚠ This method is only available when running JRuby. |
| .empty_doc? | |
::Nokogiri::XML::Node - Inherited
| .new | documented in lib/nokogiri/xml/node.rb. |
Instance Attribute Summary
-
#quirks_mode
readonly
Get the parser’s quirks mode value.
-
#url
readonly
Get the url name for this document, as passed into .parse, .read_io, or .read_memory
::Nokogiri::HTML4::Document - Inherited
| #meta_encoding | Get the meta tag encoding for this document. |
| #meta_encoding= | Set the meta tag encoding for this document. |
| #title | Get the title string of this document. |
| #title= | Set the title string of this document. |
::Nokogiri::XML::Document - Inherited
| #encoding | Get the encoding for this |
| #encoding= | Set the encoding string for this |
| #errors | The errors found while parsing a document. |
| #namespace_inheritance | When |
| #root | Get the root node for this document. |
| #root= | Set the root element on this document. |
::Nokogiri::XML::Node - Inherited
| #blank? |
|
| #cdata? | Returns true if this is a CDATA. |
| #children | :category: Traversing Document Structure. |
| #children= | Set the content for this |
| #comment? | Returns true if this is a Comment. |
| #content | [Returns]. |
| #content= | Set the content of this node to |
| #default_namespace= | Adds a default namespace supplied as a string #url href, to self. |
| #document | :category: Traversing Document Structure. |
| #document? | Returns true if this is a |
| #elem? | Alias for XML::Node#element?. |
| #element? | Returns true if this is an Element node. |
| #fragment? | Returns true if this is a |
| #html? | Returns true if this is an |
| #inner_html | Get the inner_html for this node’s |
| #inner_html= | Set the content for this |
| #inner_text | Alias for XML::Node#content. |
| #lang | Searches the language of a node, i.e. |
| #lang= | Set the language of a node, i.e. |
| #line |
|
| #line= | Sets the line for this |
| #name | Alias for XML::Node#node_name. |
| #namespace |
|
| #namespace= | Set the default namespace on this node (as would be defined with an “xmlns=” attribute in |
| #native_content= | Set the content of this node to |
| #next | Alias for XML::Node#next_sibling. |
| #next= | Alias for XML::Node#add_next_sibling. |
| #node_name | Returns the name for this |
| #node_name= | Set the name for this |
| #parent | |
| #parent= | |
| #previous | Alias for XML::Node#previous_sibling. |
| #previous= | Alias for XML::Node#add_previous_sibling. |
| #processing_instruction? | Returns true if this is a ProcessingInstruction node. |
| #read_only? | Is this a read only node? |
| #text | Alias for XML::Node#content. |
| #text? | Returns true if this is a Text node. |
| #to_str | Alias for XML::Node#content. |
| #xml? | Returns true if this is an |
| #prepend_newline?, #data_ptr? | |
Instance Method Summary
-
#fragment() → Nokogiri::HTML5::DocumentFragment)
Parse a
::Nokogiri::HTML5document fragment frommarkup, returning aDocumentFragment. -
#xpath_doctype() → Nokogiri::CSS::XPathVisitor::DoctypeConfig)
- Returns
The document type which determines CSS-to-XPath translation.
- #to_xml(options = {}, &block) Internal use only
::Nokogiri::HTML4::Document - Inherited
| #fragment | Create a |
| #serialize | Serialize Node using |
| #type | The type for this document. |
| #xpath_doctype |
|
| #meta_content_type, #set_metadata_element | |
::Nokogiri::XML::Document - Inherited
| #<< | Alias for XML::Document#add_child. |
| #add_child, | |
| #canonicalize | Canonicalize a document and return the results. |
| #clone | Clone this node. |
| #collect_namespaces | Recursively get all namespaces from this node and its subtree and return them as a hash. |
| #create_cdata | Create a CDATA Node containing |
| #create_comment | Create a Comment Node containing |
| #create_element | Create a new Element with |
| #create_entity | Create a new entity named |
| #create_text_node | Create a Text Node with |
| #deconstruct_keys | Returns a hash describing the |
| #decorate | Apply any decorators to |
| #decorators | Get the list of decorators given |
| #document | A reference to |
| #dup | Duplicate this node. |
| #fragment | Create a |
| #name | The name of this document. |
| #namespaces | Get the hash of namespaces on the root |
| #remove_namespaces! | Remove all namespaces from all nodes in the document. |
| #slop! | Explore a document with shortcut methods. |
| #to_java | ⚠ This method is only available when running JRuby. |
| #to_xml | Alias for XML::Node#serialize. |
| #url | Get the url name for this document. |
| #validate | Validate this |
| #version | Get the |
| #xpath_doctype |
|
| #inspect_attributes, | |
| #initialize | rubocop:disable Lint/MissingSuper. |
::Nokogiri::XML::Node - Inherited
| #<< | Add |
| #<=> | Compare two |
| #== |
|
| #[] | Fetch an attribute from this node. |
| #[]= | Update the attribute |
| #accept | Accept a visitor. |
| #add_child | Appends specified Nodes to the children of |
| #add_class | Ensure HTML |
| #add_namespace | Alias for XML::Node#add_namespace_definition. |
| #add_namespace_definition | :category: Manipulating Document Structure. |
| #add_next_sibling | Insert |
| #add_previous_sibling | Insert |
| #after | Insert |
| #ancestors | |
| #append_class | Add HTML |
| #attr | Alias for XML::Node#[]. |
| #attribute | :category: Working With |
| #attribute_nodes | :category: Working With |
| #attribute_with_ns | :category: Working With |
| #attributes | Fetch this node’s attributes. |
| #before | Insert |
| #canonicalize, | |
| #child | :category: Traversing Document Structure. |
| #classes | Fetch CSS class names of a |
| #clone | Clone this node. |
| #create_external_subset | Create an external subset. |
| #create_internal_subset | Create the internal subset of a document. |
| #css_path | Get the path to this node as a |
| #deconstruct_keys | Returns a hash describing the |
| #decorate! | Decorate this node with the decorators set up in this node’s |
| #delete | Alias for XML::Node#remove_attribute. |
| #description | Fetch the |
| #do_xinclude | Do xinclude substitution on the subtree below node. |
| #dup | Duplicate this node. |
| #each | Iterate over each attribute name and value pair for this |
| #element_children | [Returns]. |
| #elements | Alias for XML::Node#element_children. |
| #encode_special_chars | Encode any special characters in |
| #external_subset | Get the external subset. |
| #first_element_child |
|
| #fragment | Create a |
| #get_attribute | Alias for XML::Node#[]. |
| #has_attribute? | Alias for XML::Node#key?. |
| #initialize | Create a new node with |
| #internal_subset | Get the internal subset. |
| #key? | Returns true if |
| #keys | Get the attribute names for this |
| #kwattr_add | Ensure that values are present in a keyword attribute. |
| #kwattr_append | Add keywords to a Node’s keyword attribute, regardless of duplication. |
| #kwattr_remove | Remove keywords from a keyword attribute. |
| #kwattr_values | Fetch values from a keyword attribute of a |
| #last_element_child |
|
| #matches? | Returns true if this |
| #namespace_definitions | [Returns]. |
| #namespace_scopes |
|
| #namespaced_key? | Returns true if |
| #namespaces | Fetch all the namespaces on this node and its ancestors. |
| #next_element | Returns the next |
| #next_sibling | Returns the next sibling node. |
| #node_type | Get the type for this |
| #parse | Parse |
| #path | Returns the path associated with this |
| #pointer_id | [Returns]. |
| #prepend_child | Add |
| #previous_element | Returns the previous |
| #previous_sibling | Returns the previous sibling node. |
| #remove | Alias for XML::Node#unlink. |
| #remove_attribute | Remove the attribute named |
| #remove_class | Remove HTML |
| #replace | Replace this |
| #serialize | Serialize Node using |
| #set_attribute | Alias for XML::Node#[]=. |
| #swap | Swap this |
| #to_html | Serialize this |
| #to_s | Turn this node in to a string. |
| #to_xhtml | Serialize this |
| #to_xml | Serialize this |
| #traverse | Yields all children to |
| #type | Alias for XML::Node#node_type. |
| #unlink | Unlink this node from its current context. |
| #value? | Does this Node’s attributes include <value>. |
| #values | Get the attribute values for this |
| #wrap | Wrap this |
| #write_html_to | Write Node as |
| #write_to | Serialize this node or document to |
| #write_xhtml_to | Write Node as XHTML to |
| #write_xml_to | Write Node as |
| #add_child_node_and_reparent_attrs, #add_sibling, | |
| #compare | Compare this |
| #dump_html | Returns the |
| #get | Get the value for |
| #html_standard_serialize, | |
| #in_context | TODO: DOCUMENT ME. |
| #inspect_attributes, #keywordify, | |
| #native_write_to | Write this |
| #process_xincludes | Loads and substitutes all xinclude elements below the node. |
| #set | Set the |
| #set_namespace | Set the namespace to |
| #to_format, #write_format_to, #add_child_node, #add_next_sibling_node, #add_previous_sibling_node, #replace_node | |
::Nokogiri::ClassResolver - Included
| #related_class | Find a class constant within the. |
::Nokogiri::XML::Searchable - Included
| #% | Alias for XML::Searchable#at. |
| #/ | Alias for XML::Searchable#search. |
| #> | Search this node’s immediate children using |
| #at | Search this object for |
| #at_css | Search this object for |
| #at_xpath | Search this node for XPath |
| #css | Search this object for |
| #search | Search this object for |
| #xpath | Search this node for XPath |
| #css_internal, #css_rules_to_xpath, #xpath_impl, #xpath_internal, #xpath_query_from_css_rule, #extract_params | |
::Nokogiri::XML::PP::Node - Included
Constructor Details
.new(*args) ⇒ Document
# File 'lib/nokogiri/html5/document.rb', line 159
def initialize(*args) # :nodoc: super @url = nil @quirks_mode = nil end
Class Method Details
.do_parse(string_or_io, url, encoding, **options) (private)
[ GitHub ]# File 'lib/nokogiri/html5/document.rb', line 146
def do_parse(string_or_io, url, encoding, **) string = HTML5.read_and_encode(string_or_io, encoding) [:max_attributes] ||= Nokogiri::Gumbo::DEFAULT_MAX_ATTRIBUTES [:max_errors] ||= .delete(:max_parse_errors) || Nokogiri::Gumbo::DEFAULT_MAX_ERRORS [:max_tree_depth] ||= Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH doc = Nokogiri::Gumbo.parse(string, url, self, **) doc.encoding = "UTF-8" doc end
.parse(input) { |options| ... } → HTML5::Document)
.parse(input, url: encoding:) { |options| ... } → HTML5::Document)
.parse(input, **options) → HTML5::Document)
Parse HTML input with a parser compliant with the ::Nokogiri::HTML5 spec. This method uses the encoding of input if it can be determined, or else falls back to the encoding: parameter.
- Required Parameters
-
input(String | IO) the HTML content to be parsed.
- Optional Parameters
-
url:(String) the base URI of the document.
- Optional Keyword Arguments
-
encoding:(Encoding) The name of the encoding that should be used when processing the document. When not provided, the encoding will be determined based on the document content. -
max_errors:(Integer) The maximum number of parse errors to record. (default Gumbo::DEFAULT_MAX_ERRORS which is currently 0) -
max_tree_depth:(Integer) The maximum depth of the parse tree. (default Gumbo::DEFAULT_MAX_TREE_DEPTH) -
max_attributes:(Integer) The maximum number of attributes allowed on an element. (default Gumbo::DEFAULT_MAX_ATTRIBUTES) -
parse_noscript_content_as_text:(Boolean) Whether to parse the content ofnoscriptelements as text. (defaultfalse)
See HTML5@Parsing+options for a complete description of these parsing options.
- Yields
-
If present, the block will be passed a Hash object to modify with parse options before the input is parsed. See HTML5@Parsing+options for a list of available options.
⚠ Note that
url:andencoding:cannot be set by the configuration block. - Returns
-
Document
Example: Parse a string with a specific encoding and custom max errors limit.
Nokogiri::HTML5::Document.parse(socket, encoding: "ISO-8859-1", max_errors: 10)
Example: Parse a string setting the :parse_noscript_content_as_text option using the configuration block parameter.
Nokogiri::HTML5::Document.parse(input) { |c| c[:parse_noscript_content_as_text] = true }
# File 'lib/nokogiri/html5/document.rb', line 103
def parse( string_or_io, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **, &block ) yield if block string_or_io = "" unless string_or_io if string_or_io.respond_to?(:encoding) && string_or_io.encoding != Encoding::ASCII_8BIT encoding ||= string_or_io.encoding.name end if string_or_io.respond_to?(:read) && string_or_io.respond_to?(:path) url ||= string_or_io.path end unless string_or_io.respond_to?(:read) || string_or_io.respond_to?(:to_str) raise ArgumentError, "not a string or IO object" end do_parse(string_or_io, url, encoding, **) end
.read_io(io, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options)
Create a new document from an IO object.
💡 Most users should prefer .parse to this method.
.read_memory(string, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options)
Create a new document from a String.
💡 Most users should prefer .parse to this method.
Instance Attribute Details
#quirks_mode (readonly)
Get the parser’s quirks mode value. See QuirksMode.
This method returns nil if the parser was not invoked (e.g., .new).
Since v1.14.0
# File 'lib/nokogiri/html5/document.rb', line 49
attr_reader :quirks_mode
#url (readonly)
Get the url name for this document, as passed into .parse, .read_io, or .read_memory
# File 'lib/nokogiri/html5/document.rb', line 42
attr_reader :url
Instance Method Details
#fragment() → Nokogiri::HTML5::DocumentFragment)
#fragment(markup) → Nokogiri::HTML5::DocumentFragment)
Parse a ::Nokogiri::HTML5 document fragment from markup, returning a DocumentFragment.
- Properties
-
markup(String) The HTML5 markup fragment to be parsed
- Returns
-
Nokogiri::HTML5::DocumentFragment. This object’s children will be empty if
markupis not passed, is empty, or isnil.
# File 'lib/nokogiri/html5/document.rb', line 178
def fragment(markup = nil) DocumentFragment.new(self, markup) end
#to_xml(options = {}, &block)
#xpath_doctype() → Nokogiri::CSS::XPathVisitor::DoctypeConfig)
- Returns
-
The document type which determines CSS-to-XPath translation.
See ::Nokogiri::CSS::XPathVisitor for more information.
# File 'lib/nokogiri/html5/document.rb', line 194
def xpath_doctype Nokogiri::CSS::XPathVisitor::DoctypeConfig::HTML5 end