Class: Nokogiri::HTML4::Document
Relationships & Source Files | |
Extension / Inclusion / Inheritance Descendants | |
Subclasses:
|
|
Super Chains via Extension / Inclusion / Inheritance | |
Class Chain:
|
|
Instance Chain:
|
|
Inherits: |
Nokogiri::XML::Document
|
Defined in: | ext/nokogiri/html4_document.c, lib/nokogiri/html4/document.rb |
Constant Summary
::Nokogiri::XML::PP::Node
- Included
::Nokogiri::XML::Searchable
- Included
::Nokogiri::ClassResolver
- Included
::Nokogiri::XML::Node
- Inherited
ATTRIBUTE_DECL, ATTRIBUTE_NODE, CDATA_SECTION_NODE, COMMENT_NODE, DECONSTRUCT_KEYS, DECONSTRUCT_METHODS, DOCB_DOCUMENT_NODE, DOCUMENT_FRAG_NODE, DOCUMENT_NODE, DOCUMENT_TYPE_NODE, DTD_NODE, ELEMENT_DECL, ELEMENT_NODE, ENTITY_DECL, ENTITY_NODE, ENTITY_REF_NODE, HTML_DOCUMENT_NODE, IMPLIED_XPATH_CONTEXTS, NAMESPACE_DECL, NOTATION_NODE, PI_NODE, TEXT_NODE, XINCLUDE_END, XINCLUDE_START
::Nokogiri::XML::Document
- Inherited
IMPLIED_XPATH_CONTEXTS, NCNAME_CHAR, NCNAME_RE, NCNAME_START_CHAR, OBJECT_CLONE_METHOD, OBJECT_DUP_METHOD
Class Method Summary
-
.new
constructor
Create a new document.
-
.parse(string_or_io, url = nil, encoding = nil, options = XML::ParseOptions::DEFAULT_HTML) {|options| ... }
Parse HTML.
-
.read_io(io, url, encoding, options)
Read the
::Nokogiri::HTML
document fromio
with givenurl
,encoding
, andoptions
. -
.read_memory(string, url, encoding, options)
Read the
::Nokogiri::HTML
document contained instring
with givenurl
,encoding
, andoptions
.
::Nokogiri::XML::Document
- Inherited
.new | Alias for XML::Comment.new. |
.parse | Parse XML input from a String or IO object, and return a new |
.read_io | Create a new document from an IO object. |
.read_memory | Create a new document from a String. |
.wrap | ⚠ This method is only available when running JRuby. |
.empty_doc? |
::Nokogiri::XML::Node
- Inherited
.new | documented in lib/nokogiri/xml/node.rb. |
Instance Attribute Summary
-
#meta_encoding
rw
Get the meta tag encoding for this document.
-
#meta_encoding=(encoding)
rw
Set the meta tag encoding for this document.
-
#title
rw
Get the title string of this document.
-
#title=(text)
rw
Set the title string of this document.
::Nokogiri::XML::Document
- Inherited
#encoding | Get the encoding for this |
#encoding= | Set the encoding string for this |
#errors | The errors found while parsing a document. |
#namespace_inheritance | When |
#root | Get the root node for this document. |
#root= | Set the root element on this document. |
::Nokogiri::XML::Node
- Inherited
#blank? |
|
#cdata? | Returns true if this is a CDATA. |
#children | :category: Traversing Document Structure. |
#children= | Set the content for this Node |
#comment? | Returns true if this is a Comment. |
#content | [Returns]. |
#content= | Set the content of this node to |
#default_namespace= | Adds a default namespace supplied as a string |
#document | :category: Traversing Document Structure. |
#document? | Returns true if this is a |
#elem? | Alias for XML::Node#element?. |
#element? | Returns true if this is an Element node. |
#fragment? | Returns true if this is a |
#html? | Returns true if this is an |
#inner_html | Get the inner_html for this node’s |
#inner_html= | Set the content for this Node to |
#inner_text | Alias for XML::Node#content. |
#lang | Searches the language of a node, i.e. |
#lang= | Set the language of a node, i.e. |
#line |
|
#line= | Sets the line for this Node. |
#name | Alias for XML::Node#node_name. |
#namespace |
|
#namespace= | Set the default namespace on this node (as would be defined with an “xmlns=” attribute in |
#native_content= | Set the content of this node to |
#next | Alias for XML::Node#next_sibling. |
#next= | Alias for XML::Node#add_next_sibling. |
#node_name | Returns the name for this Node. |
#node_name= | Set the name for this Node. |
#parent | Get the parent Node for this Node. |
#parent= | Set the parent Node for this Node. |
#previous | Alias for XML::Node#previous_sibling. |
#previous= | Alias for XML::Node#add_previous_sibling. |
#processing_instruction? | Returns true if this is a ProcessingInstruction node. |
#read_only? | Is this a read only node? |
#text | Alias for XML::Node#content. |
#text? | Returns true if this is a Text node. |
#to_str | Alias for XML::Node#content. |
#xml? | Returns true if this is an |
#prepend_newline?, #data_ptr? |
Instance Method Summary
-
#fragment(tags = nil)
Create a
::Nokogiri::XML::DocumentFragment
fromtags
-
#serialize(options = {})
Serialize Node using
options
. -
#type
The type for this document.
-
#xpath_doctype() → Nokogiri::CSS::XPathVisitor::DoctypeConfig)
- Returns
The document type which determines CSS-to-XPath translation.
- #meta_content_type private
- #set_metadata_element(element) private
::Nokogiri::XML::Document
- Inherited
#<< | Alias for XML::Document#add_child. |
#add_child, | |
#canonicalize | Canonicalize a document and return the results. |
#clone | Clone this node. |
#collect_namespaces | Recursively get all namespaces from this node and its subtree and return them as a hash. |
#create_cdata | Create a CDATA Node containing |
#create_comment | Create a Comment Node containing |
#create_element | Create a new Element with |
#create_entity | Create a new entity named |
#create_text_node | Create a Text Node with |
#deconstruct_keys | Returns a hash describing the |
#decorate | Apply any decorators to |
#decorators | Get the list of decorators given |
#document | A reference to |
#dup | Duplicate this node. |
#fragment | Create a |
#name | The name of this document. |
#namespaces | Get the hash of namespaces on the root |
#remove_namespaces! | Remove all namespaces from all nodes in the document. |
#slop! | Explore a document with shortcut methods. |
#to_java | ⚠ This method is only available when running JRuby. |
#to_xml | Alias for XML::Node#serialize. |
#url | Get the url name for this document. |
#validate | Validate this |
#version | Get the |
#xpath_doctype |
|
#inspect_attributes, | |
#initialize | rubocop:disable Lint/MissingSuper. |
::Nokogiri::XML::Node
- Inherited
#<< | Add |
#<=> | Compare two Node objects with respect to their |
#== |
|
#[] | Fetch an attribute from this node. |
#[]= | Update the attribute |
#accept | Accept a visitor. |
#add_child | Add |
#add_class | Ensure HTML |
#add_namespace | Alias for XML::Node#add_namespace_definition. |
#add_namespace_definition | :category: Manipulating Document Structure. |
#add_next_sibling | Insert |
#add_previous_sibling | Insert |
#after | Insert |
#ancestors | Get a list of ancestor Node for this Node. |
#append_class | Add HTML |
#attr | Alias for XML::Node#[]. |
#attribute | :category: Working With Node Attributes. |
#attribute_nodes | :category: Working With Node Attributes. |
#attribute_with_ns | :category: Working With Node Attributes. |
#attributes | Fetch this node’s attributes. |
#before | Insert |
#canonicalize, | |
#child | :category: Traversing Document Structure. |
#classes | Fetch CSS class names of a Node. |
#clone | Clone this node. |
#create_external_subset | Create an external subset. |
#create_internal_subset | Create the internal subset of a document. |
#css_path | Get the path to this node as a |
#deconstruct_keys | Returns a hash describing the Node, to use in pattern matching. |
#decorate! | Decorate this node with the decorators set up in this node’s |
#delete | Alias for XML::Node#remove_attribute. |
#description | Fetch the |
#do_xinclude | Do xinclude substitution on the subtree below node. |
#dup | Duplicate this node. |
#each | Iterate over each attribute name and value pair for this Node. |
#element_children | [Returns]. |
#elements | Alias for XML::Node#element_children. |
#encode_special_chars | Encode any special characters in |
#external_subset | Get the external subset. |
#first_element_child |
|
#fragment | Create a |
#get_attribute | Alias for XML::Node#[]. |
#has_attribute? | Alias for XML::Node#key?. |
#initialize | Create a new node with |
#internal_subset | Get the internal subset. |
#key? | Returns true if |
#keys | Get the attribute names for this Node. |
#kwattr_add | Ensure that values are present in a keyword attribute. |
#kwattr_append | Add keywords to a Node’s keyword attribute, regardless of duplication. |
#kwattr_remove | Remove keywords from a keyword attribute. |
#kwattr_values | Fetch values from a keyword attribute of a Node. |
#last_element_child |
|
#matches? | Returns true if this Node matches |
#namespace_definitions | [Returns]. |
#namespace_scopes |
|
#namespaced_key? | Returns true if |
#namespaces | Fetch all the namespaces on this node and its ancestors. |
#next_element | Returns the next |
#next_sibling | Returns the next sibling node. |
#node_type | Get the type for this Node. |
#parse | Parse |
#path | Returns the path associated with this Node. |
#pointer_id | [Returns]. |
#prepend_child | Add |
#previous_element | Returns the previous |
#previous_sibling | Returns the previous sibling node. |
#remove | Alias for XML::Node#unlink. |
#remove_attribute | Remove the attribute named |
#remove_class | Remove HTML |
#replace | Replace this Node with |
#serialize | Serialize Node using |
#set_attribute | Alias for XML::Node#[]=. |
#swap | Swap this Node for |
#to_html | Serialize this Node to |
#to_s | Turn this node in to a string. |
#to_xhtml | Serialize this Node to XHTML using |
#to_xml | Serialize this Node to |
#traverse | Yields self and all children to |
#type | Alias for XML::Node#node_type. |
#unlink | Unlink this node from its current context. |
#value? | Does this Node’s attributes include <value>. |
#values | Get the attribute values for this Node. |
#wrap | Wrap this Node with the node parsed from |
#write_html_to | Write Node as |
#write_to | Serialize this node or document to |
#write_xhtml_to | Write Node as XHTML to |
#write_xml_to | Write Node as |
#add_child_node_and_reparent_attrs, #add_sibling, | |
#compare | Compare this Node to |
#dump_html | Returns the Node as html. |
#get | Get the value for |
#html_standard_serialize, | |
#in_context | TODO: DOCUMENT ME. |
#inspect_attributes, #keywordify, | |
#native_write_to | Write this Node to |
#process_xincludes | Loads and substitutes all xinclude elements below the node. |
#set | Set the |
#set_namespace | Set the namespace to |
#to_format, #write_format_to, #add_child_node, #add_next_sibling_node, #add_previous_sibling_node, #replace_node |
::Nokogiri::ClassResolver
- Included
#related_class | Find a class constant within the. |
::Nokogiri::XML::Searchable
- Included
#% | Alias for XML::Searchable#at. |
#/ | Alias for XML::Searchable#search. |
#> | Search this node’s immediate children using |
#at | Search this object for |
#at_css | Search this object for |
#at_xpath | Search this node for XPath |
#css | Search this object for |
#search | Search this object for |
#xpath | Search this node for XPath |
#css_internal, #css_rules_to_xpath, #xpath_impl, #xpath_internal, #xpath_query_from_css_rule, #extract_params |
::Nokogiri::XML::PP::Node
- Included
Constructor Details
.new
Create a new document
# File 'ext/nokogiri/html4_document.c', line 14
static VALUE rb_html_document_s_new(int argc, VALUE *argv, VALUE klass) { VALUE uri, external_id, rest, rb_doc; htmlDocPtr doc; rb_scan_args(argc, argv, "0*", &rest); uri = rb_ary_entry(rest, (long)0); external_id = rb_ary_entry(rest, (long)1); doc = htmlNewDoc( RTEST(uri) ? (const xmlChar *)StringValueCStr(uri) : NULL, RTEST(external_id) ? (const xmlChar *)StringValueCStr(external_id) : NULL ); rb_doc = noko_xml_document_wrap_with_init_args(klass, doc, argc, argv); return rb_doc ; }
Class Method Details
.parse(string_or_io, url = nil, encoding = nil, options = XML::ParseOptions::DEFAULT_HTML) {|options| ... }
Parse HTML. string_or_io
may be a String, or any object that responds to read and close such as an IO, or StringIO. url
is resource where this document is located. encoding
is the encoding that should be used when processing the document. options
is a number that sets options in the parser, such as XML::ParseOptions::RECOVER. See the constants in ::Nokogiri::XML::ParseOptions
.
# File 'lib/nokogiri/html4/document.rb', line 172
def parse(string_or_io, url = nil, encoding = nil, = XML::ParseOptions::DEFAULT_HTML) = Nokogiri::XML::ParseOptions.new( ) if Integer === yield if block_given? url ||= string_or_io.respond_to?(:path) ? string_or_io.path : nil if string_or_io.respond_to?(:encoding) unless string_or_io.encoding == Encoding::ASCII_8BIT encoding ||= string_or_io.encoding.name end end if string_or_io.respond_to?(:read) if string_or_io.is_a?(Pathname) # resolve the Pathname to the file and open it as an IO object, see #2110 string_or_io = string_or_io. .open url ||= string_or_io.path end unless encoding string_or_io = EncodingReader.new(string_or_io) begin return read_io(string_or_io, url, encoding, .to_i) rescue EncodingReader::EncodingFound => e encoding = e.found_encoding end end return read_io(string_or_io, url, encoding, .to_i) end # read_memory pukes on empty docs if string_or_io.nil? || string_or_io.empty? return encoding ? new.tap { |i| i.encoding = encoding } : new end encoding ||= EncodingReader.detect_encoding(string_or_io) read_memory(string_or_io, url, encoding, .to_i) end
.read_io(io, url, encoding, options)
Read the ::Nokogiri::HTML
document from io
with given url
, encoding
, and options
. See Nokogiri::HTML4.parse
# File 'ext/nokogiri/html4_document.c', line 39
static VALUE rb_html_document_s_read_io(VALUE klass, VALUE rb_io, VALUE rb_url, VALUE rb_encoding, VALUE rb_options) { VALUE rb_doc; VALUE rb_error_list = rb_ary_new(); htmlDocPtr c_doc; const char *c_url = NIL_P(rb_url) ? NULL : StringValueCStr(rb_url); const char *c_encoding = NIL_P(rb_encoding) ? NULL : StringValueCStr(rb_encoding); int options = NUM2INT(rb_options); xmlSetStructuredErrorFunc((void *)rb_error_list, noko__error_array_pusher); c_doc = htmlReadIO(noko_io_read, noko_io_close, (void *)rb_io, c_url, c_encoding, options); xmlSetStructuredErrorFunc(NULL, NULL); /* * If EncodingFound has occurred in EncodingReader, make sure to do * a cleanup and propagate the error. */ if (rb_respond_to(rb_io, id_encoding_found)) { VALUE encoding_found = rb_funcall(rb_io, id_encoding_found, 0); if (!NIL_P(encoding_found)) { xmlFreeDoc(c_doc); rb_exc_raise(encoding_found); } } if ((c_doc == NULL) || (!(options & XML_PARSE_RECOVER) && (RARRAY_LEN(rb_error_list) > 0))) { VALUE rb_error ; xmlFreeDoc(c_doc); rb_error = rb_ary_entry(rb_error_list, 0); if (rb_error == Qnil) { rb_raise(rb_eRuntimeError, "Could not parse document"); } else { VALUE exception_message = rb_funcall(rb_error, id_to_s, 0); exception_message = rb_str_concat(rb_str_new2("Parser without recover option encountered error or warning: "), exception_message); rb_exc_raise(rb_class_new_instance(1, &exception_message, cNokogiriXmlSyntaxError)); } return Qnil; } rb_doc = noko_xml_document_wrap(klass, c_doc); rb_iv_set(rb_doc, "@errors", rb_error_list); return rb_doc; }
.read_memory(string, url, encoding, options)
Read the ::Nokogiri::HTML
document contained in string
with given url
, encoding
, and options
. See Nokogiri::HTML4.parse
# File 'ext/nokogiri/html4_document.c', line 97
static VALUE rb_html_document_s_read_memory(VALUE klass, VALUE rb_html, VALUE rb_url, VALUE rb_encoding, VALUE rb_options) { VALUE rb_doc; VALUE rb_error_list = rb_ary_new(); htmlDocPtr c_doc; const char *c_buffer = StringValuePtr(rb_html); const char *c_url = NIL_P(rb_url) ? NULL : StringValueCStr(rb_url); const char *c_encoding = NIL_P(rb_encoding) ? NULL : StringValueCStr(rb_encoding); int html_len = (int)RSTRING_LEN(rb_html); int options = NUM2INT(rb_options); xmlSetStructuredErrorFunc((void *)rb_error_list, noko__error_array_pusher); c_doc = htmlReadMemory(c_buffer, html_len, c_url, c_encoding, options); xmlSetStructuredErrorFunc(NULL, NULL); if ((c_doc == NULL) || (!(options & XML_PARSE_RECOVER) && (RARRAY_LEN(rb_error_list) > 0))) { VALUE rb_error ; xmlFreeDoc(c_doc); rb_error = rb_ary_entry(rb_error_list, 0); if (rb_error == Qnil) { rb_raise(rb_eRuntimeError, "Could not parse document"); } else { VALUE exception_message = rb_funcall(rb_error, id_to_s, 0); exception_message = rb_str_concat(rb_str_new2("Parser without recover option encountered error or warning: "), exception_message); rb_exc_raise(rb_class_new_instance(1, &exception_message, cNokogiriXmlSyntaxError)); } return Qnil; } rb_doc = noko_xml_document_wrap(klass, c_doc); rb_iv_set(rb_doc, "@errors", rb_error_list); return rb_doc; }
Instance Attribute Details
#meta_encoding (rw)
Get the meta tag encoding for this document. If there is no meta tag, then nil is returned.
# File 'lib/nokogiri/html4/document.rb', line 12
def if ( = at_xpath("//meta[@charset]")) [:charset] elsif ( = ) ["content"][/charset\s*=\s*([\w-]+)/i, 1] end end
#meta_encoding=(encoding) (rw)
Set the meta tag encoding for this document.
If an meta encoding tag is already present, its content is replaced with the given text.
Otherwise, this method tries to create one at an appropriate place supplying head and/or html elements as necessary, which is inside a head element if any, and before any text node or content element (typically <body>) if any.
The result when trying to set an encoding that is different from the document encoding is undefined.
Beware in CRuby, that libxml2 automatically inserts a meta tag into a head element.
# File 'lib/nokogiri/html4/document.rb', line 36
def (encoding) if ( = ) ["content"] = format("text/html; charset=%s", encoding) encoding elsif ( = at_xpath("//meta[@charset]")) ["charset"] = encoding else = XML::Node.new("meta", self) if (dtd = internal_subset) && dtd.html5_dtd? ["charset"] = encoding else ["http-equiv"] = "Content-Type" ["content"] = format("text/html; charset=%s", encoding) end if (head = at_xpath("//head")) head.prepend_child( ) else ( ) end encoding end end
#title (rw)
Get the title string of this document. Return nil if there is no title tag.
# File 'lib/nokogiri/html4/document.rb', line 70
def title (title = at_xpath("//title")) && title.inner_text end
#title=(text) (rw)
Set the title string of this document.
If a title element is already present, its content is replaced with the given text.
Otherwise, this method tries to create one at an appropriate place supplying head and/or html elements as necessary, which is inside a head element if any, right after a meta encoding/charset tag if any, and before any text node or content element (typically <body>) if any.
# File 'lib/nokogiri/html4/document.rb', line 85
def title=(text) tnode = XML::Text.new(text, self) if (title = at_xpath("//title")) title.children = tnode return text end title = XML::Node.new("title", self) << tnode if (head = at_xpath("//head")) head << title elsif ( = at_xpath("//meta[@charset]") || ) # better put after charset declaration .add_next_sibling(title) else (title) end end
Instance Method Details
#fragment(tags = nil)
Create a ::Nokogiri::XML::DocumentFragment
from tags
# File 'lib/nokogiri/html4/document.rb', line 149
def fragment( = nil) DocumentFragment.new(self, , root) end
#meta_content_type (private)
[ GitHub ]# File 'lib/nokogiri/html4/document.rb', line 60
def xpath("//meta[@http-equiv and boolean(@content)]").find do |node| node["http-equiv"] =~ /\AContent-Type\z/i end end
#serialize(options = {})
Serialize Node using options
. Save options can also be set using a block.
See also ::Nokogiri::XML::Node::SaveOptions
and Node@Serialization+and+Generating+Output.
These two statements are equivalent:
node.serialize(:encoding => 'UTF-8', :save_with => FORMAT | AS_XML)
or
node.serialize(:encoding => 'UTF-8') do |config|
config.format.as_xml
end
# File 'lib/nokogiri/html4/document.rb', line 142
def serialize( = {}) [:save_with] ||= XML::Node::SaveOptions::DEFAULT_HTML super end
#set_metadata_element(element) (private)
[ GitHub ]# File 'lib/nokogiri/html4/document.rb', line 103
def (element) # rubocop:disable Naming/AccessorMethodName if (head = at_xpath("//head")) head << element elsif (html = at_xpath("//html")) head = html.prepend_child(XML::Node.new("head", self)) head.prepend_child(element) elsif (first = children.find do |node| case node when XML::Element, XML::Text true end end) # We reach here only if the underlying document model # allows <html>/<head> elements to be omitted and does not # automatically supply them. first.add_previous_sibling(element) else html = add_child(XML::Node.new("html", self)) head = html.add_child(XML::Node.new("head", self)) head.prepend_child(element) end end
#type
The type for this document
# File 'ext/nokogiri/html4_document.c', line 144
static VALUE rb_html_document_type(VALUE self) { htmlDocPtr doc = noko_xml_document_unwrap(self); return INT2NUM(doc->type); }
#xpath_doctype() → Nokogiri::CSS::XPathVisitor::DoctypeConfig)
- Returns
-
The document type which determines CSS-to-XPath translation.
See XPathVisitor for more information.
# File 'lib/nokogiri/html4/document.rb', line 159
def xpath_doctype Nokogiri::CSS::XPathVisitor::DoctypeConfig::HTML4 end