Class: Nokogiri::HTML5::Document
Relationships & Source Files | |
Super Chains via Extension / Inclusion / Inheritance | |
Class Chain:
|
|
Instance Chain:
|
|
Inherits: |
Nokogiri::HTML4::Document
|
Defined in: | lib/nokogiri/html5/document.rb |
Overview
Since v1.12.0
💡 ::Nokogiri::HTML5
functionality is not available when running JRuby.
Constant Summary
::Nokogiri::XML::PP::Node
- Included
::Nokogiri::XML::Searchable
- Included
::Nokogiri::ClassResolver
- Included
::Nokogiri::XML::Node
- Inherited
ATTRIBUTE_DECL, ATTRIBUTE_NODE, CDATA_SECTION_NODE, COMMENT_NODE, DECONSTRUCT_KEYS, DECONSTRUCT_METHODS, DOCB_DOCUMENT_NODE, DOCUMENT_FRAG_NODE, DOCUMENT_NODE, DOCUMENT_TYPE_NODE, DTD_NODE, ELEMENT_DECL, ELEMENT_NODE, ENTITY_DECL, ENTITY_NODE, ENTITY_REF_NODE, HTML_DOCUMENT_NODE, IMPLIED_XPATH_CONTEXTS, NAMESPACE_DECL, NOTATION_NODE, PI_NODE, TEXT_NODE, XINCLUDE_END, XINCLUDE_START
::Nokogiri::XML::Document
- Inherited
IMPLIED_XPATH_CONTEXTS, NCNAME_CHAR, NCNAME_RE, NCNAME_START_CHAR, OBJECT_CLONE_METHOD, OBJECT_DUP_METHOD
Class Method Summary
-
.parse(input) { |options| ... } → HTML5::Document)
Parse HTML input with a parser compliant with the
::Nokogiri::HTML5
spec. -
.read_io(io, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options)
Create a new document from an IO object.
-
.read_memory(string, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options)
Create a new document from a String.
- .do_parse(string_or_io, url, encoding, **options) private
- .new(*args) ⇒ Document constructor Internal use only
::Nokogiri::HTML4::Document
- Inherited
.new | Create a new empty document with base URI |
.parse | Parse HTML4 input from a String or IO object, and return a new |
.read_io | Read the |
.read_memory | Read the |
::Nokogiri::XML::Document
- Inherited
.new | Alias for XML::Comment.new. |
.parse | Parse XML input from a String or IO object, and return a new |
.read_io | Create a new document from an IO object. |
.read_memory | Create a new document from a String. |
.wrap | ⚠ This method is only available when running JRuby. |
.empty_doc? |
::Nokogiri::XML::Node
- Inherited
.new | documented in lib/nokogiri/xml/node.rb. |
Instance Attribute Summary
-
#quirks_mode
readonly
Get the parser’s quirks mode value.
-
#url
readonly
Get the url name for this document, as passed into .parse, .read_io, or .read_memory
::Nokogiri::HTML4::Document
- Inherited
#meta_encoding | Get the meta tag encoding for this document. |
#meta_encoding= | Set the meta tag encoding for this document. |
#title | Get the title string of this document. |
#title= | Set the title string of this document. |
::Nokogiri::XML::Document
- Inherited
#encoding | Get the encoding for this |
#encoding= | Set the encoding string for this |
#errors | The errors found while parsing a document. |
#namespace_inheritance | When |
#root | Get the root node for this document. |
#root= | Set the root element on this document. |
::Nokogiri::XML::Node
- Inherited
#blank? |
|
#cdata? | Returns true if this is a CDATA. |
#children | :category: Traversing Document Structure. |
#children= | Set the content for this |
#comment? | Returns true if this is a Comment. |
#content | [Returns]. |
#content= | Set the content of this node to |
#default_namespace= | Adds a default namespace supplied as a string #url href, to self. |
#document | :category: Traversing Document Structure. |
#document? | Returns true if this is a |
#elem? | Alias for XML::Node#element?. |
#element? | Returns true if this is an Element node. |
#fragment? | Returns true if this is a |
#html? | Returns true if this is an |
#inner_html | Get the inner_html for this node’s |
#inner_html= | Set the content for this |
#inner_text | Alias for XML::Node#content. |
#lang | Searches the language of a node, i.e. |
#lang= | Set the language of a node, i.e. |
#line |
|
#line= | Sets the line for this |
#name | Alias for XML::Node#node_name. |
#namespace |
|
#namespace= | Set the default namespace on this node (as would be defined with an “xmlns=” attribute in |
#native_content= | Set the content of this node to |
#next | Alias for XML::Node#next_sibling. |
#next= | Alias for XML::Node#add_next_sibling. |
#node_name | Returns the name for this |
#node_name= | Set the name for this |
#parent | |
#parent= | |
#previous | Alias for XML::Node#previous_sibling. |
#previous= | Alias for XML::Node#add_previous_sibling. |
#processing_instruction? | Returns true if this is a ProcessingInstruction node. |
#read_only? | Is this a read only node? |
#text | Alias for XML::Node#content. |
#text? | Returns true if this is a Text node. |
#to_str | Alias for XML::Node#content. |
#xml? | Returns true if this is an |
#prepend_newline?, #data_ptr? |
Instance Method Summary
-
#fragment() → Nokogiri::HTML5::DocumentFragment)
Parse a
::Nokogiri::HTML5
document fragment frommarkup
, returning aDocumentFragment
. -
#xpath_doctype() → Nokogiri::CSS::XPathVisitor::DoctypeConfig)
- Returns
The document type which determines CSS-to-XPath translation.
- #to_xml(options = {}, &block) Internal use only
::Nokogiri::HTML4::Document
- Inherited
#fragment | Create a |
#serialize | Serialize Node using |
#type | The type for this document. |
#xpath_doctype |
|
#meta_content_type, #set_metadata_element |
::Nokogiri::XML::Document
- Inherited
#<< | Alias for XML::Document#add_child. |
#add_child, | |
#canonicalize | Canonicalize a document and return the results. |
#clone | Clone this node. |
#collect_namespaces | Recursively get all namespaces from this node and its subtree and return them as a hash. |
#create_cdata | Create a CDATA Node containing |
#create_comment | Create a Comment Node containing |
#create_element | Create a new Element with |
#create_entity | Create a new entity named |
#create_text_node | Create a Text Node with |
#deconstruct_keys | Returns a hash describing the |
#decorate | Apply any decorators to |
#decorators | Get the list of decorators given |
#document | A reference to |
#dup | Duplicate this node. |
#fragment | Create a |
#name | The name of this document. |
#namespaces | Get the hash of namespaces on the root |
#remove_namespaces! | Remove all namespaces from all nodes in the document. |
#slop! | Explore a document with shortcut methods. |
#to_java | ⚠ This method is only available when running JRuby. |
#to_xml | Alias for XML::Node#serialize. |
#url | Get the url name for this document. |
#validate | Validate this |
#version | Get the |
#xpath_doctype |
|
#inspect_attributes, | |
#initialize | rubocop:disable Lint/MissingSuper. |
::Nokogiri::XML::Node
- Inherited
#<< | Add |
#<=> | Compare two |
#== |
|
#[] | Fetch an attribute from this node. |
#[]= | Update the attribute |
#accept | Accept a visitor. |
#add_child | Add |
#add_class | Ensure HTML |
#add_namespace | Alias for XML::Node#add_namespace_definition. |
#add_namespace_definition | :category: Manipulating Document Structure. |
#add_next_sibling | Insert |
#add_previous_sibling | Insert |
#after | Insert |
#ancestors | |
#append_class | Add HTML |
#attr | Alias for XML::Node#[]. |
#attribute | :category: Working With |
#attribute_nodes | :category: Working With |
#attribute_with_ns | :category: Working With |
#attributes | Fetch this node’s attributes. |
#before | Insert |
#canonicalize, | |
#child | :category: Traversing Document Structure. |
#classes | Fetch CSS class names of a |
#clone | Clone this node. |
#create_external_subset | Create an external subset. |
#create_internal_subset | Create the internal subset of a document. |
#css_path | Get the path to this node as a |
#deconstruct_keys | Returns a hash describing the |
#decorate! | Decorate this node with the decorators set up in this node’s |
#delete | Alias for XML::Node#remove_attribute. |
#description | Fetch the |
#do_xinclude | Do xinclude substitution on the subtree below node. |
#dup | Duplicate this node. |
#each | Iterate over each attribute name and value pair for this |
#element_children | [Returns]. |
#elements | Alias for XML::Node#element_children. |
#encode_special_chars | Encode any special characters in |
#external_subset | Get the external subset. |
#first_element_child |
|
#fragment | Create a |
#get_attribute | Alias for XML::Node#[]. |
#has_attribute? | Alias for XML::Node#key?. |
#initialize | Create a new node with |
#internal_subset | Get the internal subset. |
#key? | Returns true if |
#keys | Get the attribute names for this |
#kwattr_add | Ensure that values are present in a keyword attribute. |
#kwattr_append | Add keywords to a Node’s keyword attribute, regardless of duplication. |
#kwattr_remove | Remove keywords from a keyword attribute. |
#kwattr_values | Fetch values from a keyword attribute of a |
#last_element_child |
|
#matches? | Returns true if this |
#namespace_definitions | [Returns]. |
#namespace_scopes |
|
#namespaced_key? | Returns true if |
#namespaces | Fetch all the namespaces on this node and its ancestors. |
#next_element | Returns the next |
#next_sibling | Returns the next sibling node. |
#node_type | Get the type for this |
#parse | Parse |
#path | Returns the path associated with this |
#pointer_id | [Returns]. |
#prepend_child | Add |
#previous_element | Returns the previous |
#previous_sibling | Returns the previous sibling node. |
#remove | Alias for XML::Node#unlink. |
#remove_attribute | Remove the attribute named |
#remove_class | Remove HTML |
#replace | Replace this |
#serialize | Serialize Node using |
#set_attribute | Alias for XML::Node#[]=. |
#swap | Swap this |
#to_html | Serialize this |
#to_s | Turn this node in to a string. |
#to_xhtml | Serialize this |
#to_xml | Serialize this |
#traverse | Yields self and all children to |
#type | Alias for XML::Node#node_type. |
#unlink | Unlink this node from its current context. |
#value? | Does this Node’s attributes include <value>. |
#values | Get the attribute values for this |
#wrap | Wrap this |
#write_html_to | Write Node as |
#write_to | Serialize this node or document to |
#write_xhtml_to | Write Node as XHTML to |
#write_xml_to | Write Node as |
#add_child_node_and_reparent_attrs, #add_sibling, | |
#compare | Compare this |
#dump_html | Returns the |
#get | Get the value for |
#html_standard_serialize, | |
#in_context | TODO: DOCUMENT ME. |
#inspect_attributes, #keywordify, | |
#native_write_to | Write this |
#process_xincludes | Loads and substitutes all xinclude elements below the node. |
#set | Set the |
#set_namespace | Set the namespace to |
#to_format, #write_format_to, #add_child_node, #add_next_sibling_node, #add_previous_sibling_node, #replace_node |
::Nokogiri::ClassResolver
- Included
#related_class | Find a class constant within the. |
::Nokogiri::XML::Searchable
- Included
#% | Alias for XML::Searchable#at. |
#/ | Alias for XML::Searchable#search. |
#> | Search this node’s immediate children using |
#at | Search this object for |
#at_css | Search this object for |
#at_xpath | Search this node for XPath |
#css | Search this object for |
#search | Search this object for |
#xpath | Search this node for XPath |
#css_internal, #css_rules_to_xpath, #xpath_impl, #xpath_internal, #xpath_query_from_css_rule, #extract_params |
::Nokogiri::XML::PP::Node
- Included
Constructor Details
.new(*args) ⇒ Document
# File 'lib/nokogiri/html5/document.rb', line 159
def initialize(*args) # :nodoc: super @url = nil @quirks_mode = nil end
Class Method Details
.do_parse(string_or_io, url, encoding, **options) (private)
[ GitHub ]# File 'lib/nokogiri/html5/document.rb', line 146
def do_parse(string_or_io, url, encoding, ** ) string = HTML5.read_and_encode(string_or_io, encoding) [:max_attributes] ||= Nokogiri::Gumbo::DEFAULT_MAX_ATTRIBUTES [:max_errors] ||= .delete(:max_parse_errors) || Nokogiri::Gumbo::DEFAULT_MAX_ERRORS [:max_tree_depth] ||= Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH doc = Nokogiri::Gumbo.parse(string, url, self, ** ) doc.encoding = "UTF-8" doc end
.parse(input) { |options| ... } → HTML5::Document)
.parse(input, url: encoding:) { |options| ... } → HTML5::Document)
.parse(input, **options) → HTML5::Document)
Parse HTML input with a parser compliant with the ::Nokogiri::HTML5
spec. This method uses the encoding of input
if it can be determined, or else falls back to the encoding:
parameter.
- Required Parameters
-
input
(String | IO) the HTML content to be parsed.
- Optional Parameters
-
url:
(String) the base URI of the document.
- Optional Keyword Arguments
-
encoding:
(Encoding) The name of the encoding that should be used when processing the document. When not provided, the encoding will be determined based on the document content. -
max_errors:
(Integer) The maximum number of parse errors to record. (default Gumbo::DEFAULT_MAX_ERRORS which is currently 0) -
max_tree_depth:
(Integer) The maximum depth of the parse tree. (default Gumbo::DEFAULT_MAX_TREE_DEPTH) -
max_attributes:
(Integer) The maximum number of attributes allowed on an element. (default Gumbo::DEFAULT_MAX_ATTRIBUTES) -
parse_noscript_content_as_text:
(Boolean) Whether to parse the content ofnoscript
elements as text. (defaultfalse
)
See HTML5@Parsing+options for a complete description of these parsing options.
- Yields
-
If present, the block will be passed a Hash object to modify with parse options before the input is parsed. See HTML5@Parsing+options for a list of available options.
⚠ Note that
url:
andencoding:
cannot be set by the configuration block. - Returns
-
Document
Example: Parse a string with a specific encoding and custom max errors limit.
Nokogiri::HTML5::Document.parse(socket, encoding: "ISO-8859-1", max_errors: 10)
Example: Parse a string setting the :parse_noscript_content_as_text
option using the configuration block parameter.
Nokogiri::HTML5::Document.parse(input) { |c| c[:parse_noscript_content_as_text] = true }
# File 'lib/nokogiri/html5/document.rb', line 103
def parse( string_or_io, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **, &block ) yield if block string_or_io = "" unless string_or_io if string_or_io.respond_to?(:encoding) && string_or_io.encoding != Encoding::ASCII_8BIT encoding ||= string_or_io.encoding.name end if string_or_io.respond_to?(:read) && string_or_io.respond_to?(:path) url ||= string_or_io.path end unless string_or_io.respond_to?(:read) || string_or_io.respond_to?(:to_str) raise ArgumentError, "not a string or IO object" end do_parse(string_or_io, url, encoding, ** ) end
.read_io(io, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options)
Create a new document from an IO object.
💡 Most users should prefer .parse to this method.
.read_memory(string, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options)
Create a new document from a String.
💡 Most users should prefer .parse to this method.
Instance Attribute Details
#quirks_mode (readonly)
Get the parser’s quirks mode value. See QuirksMode
.
This method returns nil
if the parser was not invoked (e.g., .new).
Since v1.14.0
# File 'lib/nokogiri/html5/document.rb', line 49
attr_reader :quirks_mode
#url (readonly)
Get the url name for this document, as passed into .parse, .read_io, or .read_memory
# File 'lib/nokogiri/html5/document.rb', line 42
attr_reader :url
Instance Method Details
#fragment() → Nokogiri::HTML5::DocumentFragment)
#fragment(markup) → Nokogiri::HTML5::DocumentFragment)
Parse a ::Nokogiri::HTML5
document fragment from markup
, returning a DocumentFragment
.
- Properties
-
markup
(String) The HTML5 markup fragment to be parsed
- Returns
-
Nokogiri::HTML5::DocumentFragment. This object’s children will be empty if
markup
is not passed, is empty, or isnil
.
# File 'lib/nokogiri/html5/document.rb', line 178
def fragment(markup = nil) DocumentFragment.new(self, markup) end
#to_xml(options = {}, &block)
#xpath_doctype() → Nokogiri::CSS::XPathVisitor::DoctypeConfig)
- Returns
-
The document type which determines CSS-to-XPath translation.
See ::Nokogiri::CSS::XPathVisitor
for more information.
# File 'lib/nokogiri/html5/document.rb', line 194
def xpath_doctype Nokogiri::CSS::XPathVisitor::DoctypeConfig::HTML5 end