Class: Nokogiri::XML::Document
Relationships & Source Files | |
Extension / Inclusion / Inheritance Descendants | |
Subclasses:
|
|
Super Chains via Extension / Inclusion / Inheritance | |
Class Chain:
self,
Node
|
|
Instance Chain:
|
|
Inherits: |
Nokogiri::XML::Node
|
Defined in: | lib/nokogiri/xml/document.rb, ext/nokogiri/xml_attr.c, ext/nokogiri/xml_document.c |
Overview
Document
is the main entry point for dealing with XML documents. The Document is created by parsing XML content from a String or an IO object. See .parse for more information on parsing.
Document
inherits a great deal of functionality from its superclass Node
, so please read that class’s documentation as well.
Constant Summary
-
IMPLIED_XPATH_CONTEXTS =
Internal use only
# File 'lib/nokogiri/xml/document.rb', line 507["//"].freeze
-
NCNAME_CHAR =
# File 'lib/nokogiri/xml/document.rb', line 19NCNAME_START_CHAR + "\\-\\.0-9"
-
NCNAME_RE =
# File 'lib/nokogiri/xml/document.rb', line 20/^xmlns(?::([#{NCNAME_START_CHAR}][#{NCNAME_CHAR}]*))?$/
-
NCNAME_START_CHAR =
See www.w3.org/TR/REC-xml-names/#ns-decl for more details. Note that we’re not attempting to handle unicode characters partly because libxml2 doesn’t handle unicode characters in NCNAMEs.
"A-Za-z_"
-
OBJECT_CLONE_METHOD =
private
# File 'lib/nokogiri/xml/document.rb', line 23Object.instance_method(:clone)
-
OBJECT_DUP_METHOD =
private
# File 'lib/nokogiri/xml/document.rb', line 22Object.instance_method(:dup)
PP::Node
- Included
Searchable
- Included
::Nokogiri::ClassResolver
- Included
Node
- Inherited
ATTRIBUTE_DECL, ATTRIBUTE_NODE, CDATA_SECTION_NODE, COMMENT_NODE, DECONSTRUCT_KEYS, DECONSTRUCT_METHODS, DOCB_DOCUMENT_NODE, DOCUMENT_FRAG_NODE, DOCUMENT_NODE, DOCUMENT_TYPE_NODE, DTD_NODE, ELEMENT_DECL, ELEMENT_NODE, ENTITY_DECL, ENTITY_NODE, ENTITY_REF_NODE, HTML_DOCUMENT_NODE, IMPLIED_XPATH_CONTEXTS, NAMESPACE_DECL, NOTATION_NODE, PI_NODE, TEXT_NODE, XINCLUDE_END, XINCLUDE_START
Class Method Summary
-
.new(document, name)
constructor
Alias for Comment.new.
-
.parse(input) {|options| ... } ⇒ Document
Parse XML input from a String or IO object, and return a new
Document
. -
.read_io(io, url, encoding, options)
Create a new document from an IO object.
-
.read_memory(string, url, encoding, options)
Create a new document from a String.
-
.wrap(java_document) → Nokogiri::XML::Document)
⚠ This method is only available when running JRuby.
- .empty_doc?(string_or_io) ⇒ Boolean private
Node
- Inherited
.new | documented in lib/nokogiri/xml/node.rb. |
Instance Attribute Summary
-
#encoding
rw
Get the encoding for this
Document
. -
#encoding=(encoding)
rw
Set the encoding string for this
Document
. -
#errors
rw
The errors found while parsing a document.
-
#namespace_inheritance
rw
When
true
, reparented elements without a namespace will inherit their new parent’s namespace (if one exists). -
#root
rw
Get the root node for this document.
-
#root=
rw
Set the root element on this document.
Node
- Inherited
#blank? |
|
#cdata? | Returns true if this is a |
#children | :category: Traversing Document Structure. |
#children= | Set the content for this |
#comment? | Returns true if this is a |
#content | [Returns]. |
#content= | Set the content of this node to |
#default_namespace= | Adds a default namespace supplied as a string #url href, to self. |
#document | :category: Traversing Document Structure. |
#document? | Returns true if this is a |
#elem? | Alias for Node#element?. |
#element? | Returns true if this is an |
#fragment? | Returns true if this is a |
#html? | Returns true if this is an |
#inner_html | Get the inner_html for this node’s Node#children |
#inner_html= | Set the content for this |
#inner_text | Alias for Node#content. |
#lang | Searches the language of a node, i.e. |
#lang= | Set the language of a node, i.e. |
#line |
|
#line= | Sets the line for this |
#name | Alias for Node#node_name. |
#namespace |
|
#namespace= | Set the default namespace on this node (as would be defined with an “xmlns=” attribute in |
#native_content= | Set the content of this node to |
#next | Alias for Node#next_sibling. |
#next= | Alias for Node#add_next_sibling. |
#node_name | Returns the name for this |
#node_name= | Set the name for this |
#parent | |
#parent= | |
#previous | Alias for Node#previous_sibling. |
#previous= | Alias for Node#add_previous_sibling. |
#processing_instruction? | Returns true if this is a |
#read_only? | Is this a read only node? |
#text | Alias for Node#content. |
#text? | Returns true if this is a |
#to_str | Alias for Node#content. |
#xml? | Returns true if this is an |
#prepend_newline?, #data_ptr? |
Instance Method Summary
-
#<<(node_or_tags)
Alias for #add_child.
- #add_child(node_or_tags) (also: #<<)
-
#canonicalize(mode = XML_C14N_1_0, inclusive_namespaces = nil, with_comments = false)
Canonicalize a document and return the results.
-
#clone(→ Nokogiri::XML::Document)
Clone this node.
-
#collect_namespaces() → Hash<String(Namespace#prefix) ⇒ String(Namespace#href)>)
Recursively get all namespaces from this node and its subtree and return them as a hash.
-
#create_cdata(string, &block)
Create a
CDATA
Node containingstring
-
#create_comment(string, &block)
Create a
Comment
Node containingstring
- #create_element(name, *contents_or_attrs, &block) → Nokogiri::XML::Element)
-
#create_entity(name, type, external_id, system_id, content)
Create a new entity named #name.
-
#create_text_node(string, &block)
Create a
Text
Node withstring
-
#deconstruct_keys(array_of_names) → Hash)
Returns a hash describing the
Document
, to use in pattern matching. -
#decorate(node)
Apply any decorators to
node
-
#decorators(key)
Get the list of decorators given
key
-
#document
A reference to
self
-
#dup(→ Nokogiri::XML::Document)
Duplicate this node.
-
#fragment(tags = nil)
Create a
DocumentFragment
fromtags
Returns an empty fragment iftags
is nil. -
#name
The name of this document.
-
#namespaces
Get the hash of namespaces on the root
Node
-
#remove_namespaces!
Remove all namespaces from all nodes in the document.
-
#slop!
Explore a document with shortcut methods.
-
#to_java() → Java::OrgW3cDom::Document) ⇒ ?
⚠ This method is only available when running JRuby.
-
#to_xml(*args, &block)
Alias for Node#serialize.
-
#url
Get the url name for this document.
-
#validate
Validate this
Document
against itsDTD
. -
#version
Get the
::Nokogiri::XML
version for thisDocument
. -
#xpath_doctype() → Nokogiri::CSS::XPathVisitor::DoctypeConfig)
- Returns
The document type which determines CSS-to-XPath translation.
- #inspect_attributes private
-
#initialize(*args) ⇒ Document
constructor
Internal use only
rubocop:disable Lint/MissingSuper.
Node
- Inherited
#<< | Add |
#<=> | Compare two |
#== |
|
#[] | Fetch an attribute from this node. |
#[]= | Update the attribute #name to |
#accept | Accept a visitor. |
#add_child | Add |
#add_class | Ensure HTML |
#add_namespace | Alias for Node#add_namespace_definition. |
#add_namespace_definition | :category: Manipulating Document Structure. |
#add_next_sibling | Insert |
#add_previous_sibling | Insert |
#after | Insert |
#ancestors | |
#append_class | Add HTML |
#attr | Alias for Node#[]. |
#attribute | :category: Working With |
#attribute_nodes | :category: Working With |
#attribute_with_ns | :category: Working With |
#attributes | Fetch this node’s attributes. |
#before | Insert |
#canonicalize, | |
#child | :category: Traversing Document Structure. |
#classes | Fetch CSS class names of a |
#clone | Clone this node. |
#create_external_subset | Create an external subset. |
#create_internal_subset | Create the internal subset of a document. |
#css_path | Get the path to this node as a |
#deconstruct_keys | Returns a hash describing the |
#decorate! | Decorate this node with the decorators set up in this node’s |
#delete | Alias for Node#remove_attribute. |
#description | Fetch the |
#do_xinclude | Do xinclude substitution on the subtree below node. |
#dup | Duplicate this node. |
#each | Iterate over each attribute name and value pair for this |
#element_children | [Returns]. |
#elements | Alias for Node#element_children. |
#encode_special_chars | Encode any special characters in |
#external_subset | Get the external subset. |
#first_element_child |
|
#fragment | Create a |
#get_attribute | Alias for Node#[]. |
#has_attribute? | Alias for Node#key?. |
#initialize | |
#internal_subset | Get the internal subset. |
#key? | Returns true if |
#keys | Get the attribute names for this |
#kwattr_add | Ensure that values are present in a keyword attribute. |
#kwattr_append | Add keywords to a Node’s keyword attribute, regardless of duplication. |
#kwattr_remove | Remove keywords from a keyword attribute. |
#kwattr_values | Fetch values from a keyword attribute of a |
#last_element_child |
|
#matches? | Returns true if this |
#namespace_definitions | [Returns]. |
#namespace_scopes |
|
#namespaced_key? | Returns true if |
#namespaces | Fetch all the namespaces on this node and its ancestors. |
#next_element | Returns the next |
#next_sibling | Returns the next sibling node. |
#node_type | Get the type for this |
#parse | Parse |
#path | Returns the path associated with this |
#pointer_id | [Returns]. |
#prepend_child | Add |
#previous_element | Returns the previous |
#previous_sibling | Returns the previous sibling node. |
#remove | Alias for Node#unlink. |
#remove_attribute | Remove the attribute named #name |
#remove_class | Remove HTML |
#replace | Replace this |
#serialize | Serialize Node using |
#set_attribute | Alias for Node#[]=. |
#swap | Swap this |
#to_html | Serialize this |
#to_s | Turn this node in to a string. |
#to_xhtml | Serialize this |
#to_xml | Serialize this |
#traverse | Yields self and all children to |
#type | Alias for Node#node_type. |
#unlink | Unlink this node from its current context. |
#value? | Does this Node’s attributes include <value>. |
#values | Get the attribute values for this |
#wrap | Wrap this |
#write_html_to | Write Node as |
#write_to | Serialize this node or document to |
#write_xhtml_to | Write Node as XHTML to |
#write_xml_to | Write Node as |
#add_child_node_and_reparent_attrs, #add_sibling, | |
#compare | Compare this |
#dump_html | Returns the |
#get | Get the value for |
#html_standard_serialize, | |
#in_context | TODO: DOCUMENT ME. |
#inspect_attributes, #keywordify, | |
#native_write_to | |
#process_xincludes | Loads and substitutes all xinclude elements below the node. |
#set | Set the |
#set_namespace | Set the namespace to |
#to_format, #write_format_to, #add_child_node, #add_next_sibling_node, #add_previous_sibling_node, #replace_node |
::Nokogiri::ClassResolver
- Included
#related_class | Find a class constant within the. |
Searchable
- Included
#% | Alias for Searchable#at. |
#/ | Alias for Searchable#search. |
#> | Search this node’s immediate children using |
#at | Search this object for |
#at_css | Search this object for |
#at_xpath | Search this node for |
#css | Search this object for |
#search | Search this object for |
#xpath | Search this node for |
#css_internal, #css_rules_to_xpath, #xpath_impl, #xpath_internal, #xpath_query_from_css_rule, #extract_params |
PP::Node
- Included
Constructor Details
.new(document, name)
Alias for Comment.new. Create a new Attr
element on the #document with #name
#initialize(*args) ⇒ Document
rubocop:disable Lint/MissingSuper
# File 'lib/nokogiri/xml/document.rb', line 190
def initialize(*args) # :nodoc: # rubocop:disable Lint/MissingSuper @errors = [] @decorators = nil @namespace_inheritance = false end
Class Method Details
.empty_doc?(string_or_io) ⇒ Boolean
(private)
# File 'lib/nokogiri/xml/document.rb', line 96
def empty_doc?(string_or_io) string_or_io.nil? || (string_or_io.respond_to?(:empty?) && string_or_io.empty?) || (string_or_io.respond_to?(:eof?) && string_or_io.eof?) end
.parse(input) {|options| ... } ⇒ Document
.parse(input, url:, encoding:, options:) ⇒ Document
Document
.parse(input, url:, encoding:, options:) ⇒ Document
Parse XML input from a String or IO object, and return a new Document
.
🛡 By default, ::Nokogiri
treats documents as untrusted, and so does not attempt to load DTDs or access the network. See ParseOptions
for a complete list of options; and that module’s DEFAULT_XML constant for what’s set (and not set) by default.
- Required Parameters
-
input
(String | IO) The content to be parsed.
- Optional Keyword Arguments
-
url:
(String) The base URI for this document. -
encoding:
(String) The name of the encoding that should be used when processing the document. When not provided, the encoding will be determined based on the document content. -
options:
(Nokogiri::XML::ParseOptions) Configuration object that determines some behaviors during parsing. See ParseOptions for more information. The default value is ParseOptions::DEFAULT_XML.
- Yields
-
If a block is given, a Nokogiri::XML::ParseOptions object is yielded to the block which can be configured before parsing. See Nokogiri::XML::ParseOptions for more information.
- Returns
-
Document
# File 'lib/nokogiri/xml/document.rb', line 56
def parse( string_or_io, url_ = nil, encoding_ = nil, = XML::ParseOptions::DEFAULT_XML, url: url_, encoding: encoding_, options: ) = Nokogiri::XML::ParseOptions.new( ) if Integer === yield if block_given? url ||= string_or_io.respond_to?(:path) ? string_or_io.path : nil if empty_doc?(string_or_io) if .strict? raise Nokogiri::XML::SyntaxError, "Empty document" else return encoding ? new.tap { |i| i.encoding = encoding } : new end end doc = if string_or_io.respond_to?(:read) # TODO: should we instead check for respond_to?(:to_path) ? if string_or_io.is_a?(Pathname) # resolve the Pathname to the file and open it as an IO object, see #2110 string_or_io = string_or_io. .open url ||= string_or_io.path end read_io(string_or_io, url, encoding, .to_i) else # read_memory pukes on empty docs read_memory(string_or_io, url, encoding, .to_i) end # do xinclude processing doc.do_xinclude( ) if .xinclude? doc end
.read_io(io, url, encoding, options)
Create a new document from an IO object
# File 'ext/nokogiri/xml_document.c', line 366
static VALUE noko_xml_document_s_read_io(VALUE rb_class, VALUE rb_io, VALUE rb_url, VALUE rb_encoding, VALUE rb_options) { /* TODO: deprecate this method, parse should be the preferred entry point. then we can make this private. */ libxmlStructuredErrorHandlerState handler_state; VALUE rb_errors = rb_ary_new(); noko__structured_error_func_save_and_set(&handler_state, (void *)rb_errors, noko__error_array_pusher); const char *c_url = NIL_P(rb_url) ? NULL : StringValueCStr(rb_url); const char *c_enc = NIL_P(rb_encoding) ? NULL : StringValueCStr(rb_encoding); xmlDocPtr c_document = xmlReadIO( (xmlInputReadCallback)noko_io_read, (xmlInputCloseCallback)noko_io_close, (void *)rb_io, c_url, c_enc, (int)NUM2INT(rb_options) ); noko__structured_error_func_restore(&handler_state); if (c_document == NULL) { xmlFreeDoc(c_document); VALUE exception = rb_funcall(cNokogiriXmlSyntaxError, rb_intern("aggregate"), 1, rb_errors); if (RB_TEST(exception)) { rb_exc_raise(exception); } else { rb_raise(rb_eRuntimeError, "Could not parse document"); } } VALUE rb_document = noko_xml_document_wrap(rb_class, c_document); rb_iv_set(rb_document, "@errors", rb_errors); return rb_document; }
.read_memory(string, url, encoding, options)
Create a new document from a String
# File 'ext/nokogiri/xml_document.c', line 415
static VALUE noko_xml_document_s_read_memory(VALUE rb_class, VALUE rb_input, VALUE rb_url, VALUE rb_encoding, VALUE rb_options) { /* TODO: deprecate this method, parse should be the preferred entry point. then we can make this private. */ VALUE rb_errors = rb_ary_new(); xmlSetStructuredErrorFunc((void *)rb_errors, noko__error_array_pusher); const char *c_buffer = StringValuePtr(rb_input); const char *c_url = NIL_P(rb_url) ? NULL : StringValueCStr(rb_url); const char *c_enc = NIL_P(rb_encoding) ? NULL : StringValueCStr(rb_encoding); int c_buffer_len = (int)RSTRING_LEN(rb_input); xmlDocPtr c_document = xmlReadMemory(c_buffer, c_buffer_len, c_url, c_enc, (int)NUM2INT(rb_options)); xmlSetStructuredErrorFunc(NULL, NULL); if (c_document == NULL) { VALUE exception = rb_funcall(cNokogiriXmlSyntaxError, rb_intern("aggregate"), 1, rb_errors); if (RB_TEST(exception)) { rb_exc_raise(exception); } else { rb_raise(rb_eRuntimeError, "Could not parse document"); } } VALUE document = noko_xml_document_wrap(rb_class, c_document); rb_iv_set(document, "@errors", rb_errors); return document; }
.wrap(java_document) → Nokogiri::XML::Document)
⚠ This method is only available when running JRuby.
Create a Document
using an existing Java DOM document object.
The returned Document
shares the same underlying data structure as the Java object, so changes in one are reflected in the other.
- Parameters
-
java_document
(Java::OrgW3cDom::Document) (The classJava::OrgW3cDom::Document
is also accessible asorg.w3c.dom.Document
.)
- Returns
-
Document
See also #to_java
# File 'lib/nokogiri/xml/document.rb', line 103
RDoc directive :singleton-method: wrap
Instance Attribute Details
#encoding (rw)
Get the encoding for this Document
# File 'ext/nokogiri/xml_document.c', line 336
static VALUE encoding(VALUE self) { xmlDocPtr doc = noko_xml_document_unwrap(self); if (!doc->encoding) { return Qnil; } return NOKOGIRI_STR_NEW2(doc->encoding); }
#encoding=(encoding) (rw)
Set the encoding string for this Document
# File 'ext/nokogiri/xml_document.c', line 316
static VALUE set_encoding(VALUE self, VALUE encoding) { xmlDocPtr doc = noko_xml_document_unwrap(self); if (doc->encoding) { xmlFree(DISCARD_CONST_QUAL_XMLCHAR(doc->encoding)); } doc->encoding = xmlStrdup((xmlChar *)StringValueCStr(encoding)); return encoding; }
#errors (rw)
The errors found while parsing a document.
- Returns
-
Array<Nokogiri::XML::SyntaxError>
# File 'lib/nokogiri/xml/document.rb', line 141
attr_accessor :errors
#namespace_inheritance (rw)
When true
, reparented elements without a namespace will inherit their new parent’s namespace (if one exists). Defaults to false
.
- Returns
-
Boolean
Example: Default behavior of namespace inheritance
xml = <<~EOF
<root xmlns:foo="http://nokogiri.org/default_ns/test/foo">
<foo:parent>
</foo:parent>
</root>
EOF
doc = Nokogiri::XML(xml)
parent = doc.at_xpath("//foo:parent", "foo" => "http://nokogiri.org/default_ns/test/foo")
parent.add_child("<child></child>")
doc.to_xml
# => <?xml version="1.0"?>
# <root xmlns:foo="http://nokogiri.org/default_ns/test/foo">
# <foo:parent>
# <child/>
# </foo:parent>
# </root>
Example: Setting namespace inheritance to true
xml = <<~EOF
<root xmlns:foo="http://nokogiri.org/default_ns/test/foo">
<foo:parent>
</foo:parent>
</root>
EOF
doc = Nokogiri::XML(xml)
doc.namespace_inheritance = true
parent = doc.at_xpath("//foo:parent", "foo" => "http://nokogiri.org/default_ns/test/foo")
parent.add_child("<child></child>")
doc.to_xml
# => <?xml version="1.0"?>
# <root xmlns:foo="http://nokogiri.org/default_ns/test/foo">
# <foo:parent>
# <foo:child/>
# </foo:parent>
# </root>
Since v1.12.4
# File 'lib/nokogiri/xml/document.rb', line 188
attr_accessor :namespace_inheritance
#root (rw)
Get the root node for this document.
# File 'ext/nokogiri/xml_document.c', line 294
static VALUE rb_xml_document_root(VALUE self) { xmlDocPtr c_document; xmlNodePtr c_root; c_document = noko_xml_document_unwrap(self); c_root = xmlDocGetRootElement(c_document); if (!c_root) { return Qnil; } return noko_xml_node_wrap(Qnil, c_root) ; }
#root= (rw)
Set the root element on this document
# File 'ext/nokogiri/xml_document.c', line 250
static VALUE rb_xml_document_root_set(VALUE self, VALUE rb_new_root) { xmlDocPtr c_document; xmlNodePtr c_new_root = NULL, c_current_root; c_document = noko_xml_document_unwrap(self); c_current_root = xmlDocGetRootElement(c_document); if (c_current_root) { xmlUnlinkNode(c_current_root); noko_xml_document_pin_node(c_current_root); } if (!NIL_P(rb_new_root)) { if (!rb_obj_is_kind_of(rb_new_root, cNokogiriXmlNode)) { rb_raise(rb_eArgError, "expected Nokogiri::XML::Node but received %"PRIsVALUE, rb_obj_class(rb_new_root)); } Noko_Node_Get_Struct(rb_new_root, xmlNode, c_new_root); /* If the new root's document is not the same as the current document, * then we need to dup the node in to this document. */ if (c_new_root->doc != c_document) { c_new_root = xmlDocCopyNode(c_new_root, c_document, 1); if (!c_new_root) { rb_raise(rb_eRuntimeError, "Could not reparent node (xmlDocCopyNode)"); } } } xmlDocSetRootElement(c_document, c_new_root); return rb_new_root; }
Instance Method Details
#<<(node_or_tags)
Alias for #add_child.
# File 'lib/nokogiri/xml/document.rb', line 449
alias_method :<<, :add_child
#add_child(node_or_tags) Also known as: #<<
[ GitHub ]# File 'lib/nokogiri/xml/document.rb', line 437
def add_child( ) raise "A document may not have multiple root nodes." if (root && root.name != "nokogiri_text_wrapper") && !( .comment? || .processing_instruction?) = coerce( ) if .is_a?(XML::NodeSet) raise "A document may not have multiple root nodes." if .size > 1 super( .first) else super end end
#canonicalize(mode = XML_C14N_1_0, inclusive_namespaces = nil, with_comments = false)
#canonicalize {|obj, parent| ... }
# File 'ext/nokogiri/xml_document.c', line 600
static VALUE rb_xml_document_canonicalize(int argc, VALUE *argv, VALUE self) { VALUE rb_mode; VALUE rb_namespaces; VALUE rb_comments_p; int c_mode = 0; xmlChar **c_namespaces; xmlDocPtr c_doc; xmlOutputBufferPtr c_obuf; xmlC14NIsVisibleCallback c_callback_wrapper = NULL; void *rb_callback = NULL; VALUE rb_cStringIO; VALUE rb_io; rb_scan_args(argc, argv, "03", &rb_mode, &rb_namespaces, &rb_comments_p); if (!NIL_P(rb_mode)) { Check_Type(rb_mode, T_FIXNUM); c_mode = NUM2INT(rb_mode); } if (!NIL_P(rb_namespaces)) { Check_Type(rb_namespaces, T_ARRAY); if (c_mode == XML_C14N_1_0 || c_mode == XML_C14N_1_1) { rb_raise(rb_eRuntimeError, "This canonicalizer does not support this operation"); } } c_doc = noko_xml_document_unwrap(self); rb_cStringIO = rb_const_get_at(rb_cObject, rb_intern("StringIO")); rb_io = rb_class_new_instance(0, 0, rb_cStringIO); c_obuf = xmlAllocOutputBuffer(NULL); c_obuf->writecallback = (xmlOutputWriteCallback)noko_io_write; c_obuf->closecallback = (xmlOutputCloseCallback)noko_io_close; c_obuf->context = (void *)rb_io; if (rb_block_given_p()) { c_callback_wrapper = block_caller; rb_callback = (void *)rb_block_proc(); } if (NIL_P(rb_namespaces)) { c_namespaces = NULL; } else { long ns_len = RARRAY_LEN(rb_namespaces); c_namespaces = ruby_xcalloc((size_t)ns_len + 1, sizeof(xmlChar *)); for (int j = 0 ; j < ns_len ; j++) { VALUE entry = rb_ary_entry(rb_namespaces, j); c_namespaces[j] = (xmlChar *)StringValueCStr(entry); } } xmlC14NExecute(c_doc, c_callback_wrapper, rb_callback, c_mode, c_namespaces, (int)RTEST(rb_comments_p), c_obuf); ruby_xfree(c_namespaces); xmlOutputBufferClose(c_obuf); return rb_funcall(rb_io, rb_intern("string"), 0); }
#clone(→ Nokogiri::XML::Document)
#clone(level) → Nokogiri::XML::Document)
Clone this node.
- Parameters
-
level
(optional Integer). 0 is a shallow copy, 1 (the default) is a deep copy.
- Returns
-
The new
Document
# File 'lib/nokogiri/xml/document.rb', line 223
def clone(level = 1) copy = OBJECT_CLONE_METHOD.bind_call(self) copy.initialize_copy_with_args(self, level) end
#collect_namespaces() → Hash<String(Namespace#prefix) ⇒ String(Namespace#href)>)
Recursively get all namespaces from this node and its subtree and return them as a hash.
⚠ This method will not handle duplicate namespace prefixes, since the return value is a hash.
Note that this method does an xpath lookup for nodes with namespaces, and as a result the order (and which duplicate prefix “wins”) may be dependent on the implementation of the underlying ::Nokogiri::XML
library.
Example: Basic usage
Given this document:
<root xmlns="default" xmlns:foo="bar">
<bar xmlns:hello="world" />
</root>
This method will return:
{"xmlns:foo"=>"bar", "xmlns"=>"default", "xmlns:hello"=>"world"}
Example: Duplicate prefixes
Given this document:
<root xmlns:foo="bar">
<bar xmlns:foo="baz" />
</root>
The hash returned will be something like:
{"xmlns:foo" => "baz"}
# File 'lib/nokogiri/xml/document.rb', line 361
def collect_namespaces xpath("//namespace::*").each_with_object({}) do |ns, hash| hash[["xmlns", ns.prefix].compact.join(":")] = ns.href if ns.prefix != "xml" end end
#create_cdata(string, &block)
Create a CDATA
Node containing string
#create_comment(string, &block)
Create a Comment
Node containing string
#create_element(name, *contents_or_attrs, &block) → Nokogiri::XML::Element)
Create a new Element
with #name belonging to this document, optionally setting contents or attributes.
This method is not the most user-friendly option if your intention is to add a node to the document tree. Prefer one of the Node
methods like Node#add_child, Node#add_next_sibling, Node#replace, etc. which will both create an element (or subtree) and place it in the document tree.
Arguments may be passed to initialize the element:
-
a Hash argument will be used to set attributes
-
a non-Hash object that responds to #to_s will be used to set the new node’s contents
A block may be passed to mutate the node.
- Parameters
-
#name (String)
-
contents_or_attrs
(#to_s, Hash)
- Yields
-
node
(Nokogiri::XML::Element) - Returns
Example: An empty element without attributes
doc.create_element("div")
# => <div></div>
Example: An element with contents
doc.create_element("div", "contents")
# => <div>contents</div>
Example: An element with attributes
doc.create_element("div", {"class" => "container"})
# => <div class='container'></div>
Example: An element with contents and attributes
doc.create_element("div", "contents", {"class" => "container"})
# => <div class='container'>contents</div>
Example: Passing a block to mutate the element
doc.create_element("div") { |node| node["class"] = "blue" if before_noon? }
# File 'lib/nokogiri/xml/document.rb', line 276
def create_element(name, *contents_or_attrs, &block) elm = Nokogiri::XML::Element.new(name, self, &block) contents_or_attrs.each do |arg| case arg when Hash arg.each do |k, v| key = k.to_s if key =~ NCNAME_RE ns_name = Regexp.last_match(1) elm.add_namespace_definition(ns_name, v) else elm[k.to_s] = v.to_s end end else elm.content = arg end end if (ns = elm.namespace_definitions.find { |n| n.prefix.nil? || (n.prefix == "") }) elm.namespace = ns end elm end
#create_entity(name, type, external_id, system_id, content)
Create a new entity named #name.
type
is an integer representing the type of entity to be created, and it defaults to Nokogiri::XML::EntityDecl::INTERNAL_GENERAL
. See the constants on EntityDecl
for more information.
external_id
, system_id
, and content
set the External ID, System ID, and content respectively. All of these parameters are optional.
# File 'ext/nokogiri/xml_document.c', line 528
static VALUE noko_xml_document__create_entity(int argc, VALUE *argv, VALUE rb_document) { VALUE rb_name; VALUE rb_type; VALUE rb_ext_id; VALUE rb_sys_id; VALUE rb_content; rb_scan_args(argc, argv, "14", &rb_name, &rb_type, &rb_ext_id, &rb_sys_id, &rb_content); xmlDocPtr c_document = noko_xml_document_unwrap(rb_document); libxmlStructuredErrorHandlerState handler_state; VALUE rb_errors = rb_ary_new(); noko__structured_error_func_save_and_set(&handler_state, (void *)rb_errors, noko__error_array_pusher); xmlEntityPtr c_entity = xmlAddDocEntity( c_document, (xmlChar *)(NIL_P(rb_name) ? NULL : StringValueCStr(rb_name)), (int)(NIL_P(rb_type) ? XML_INTERNAL_GENERAL_ENTITY : NUM2INT(rb_type)), (xmlChar *)(NIL_P(rb_ext_id) ? NULL : StringValueCStr(rb_ext_id)), (xmlChar *)(NIL_P(rb_sys_id) ? NULL : StringValueCStr(rb_sys_id)), (xmlChar *)(NIL_P(rb_content) ? NULL : StringValueCStr(rb_content)) ); noko__structured_error_func_restore(&handler_state); if (NULL == c_entity) { VALUE exception = rb_funcall(cNokogiriXmlSyntaxError, rb_intern("aggregate"), 1, rb_errors); if (RB_TEST(exception)) { rb_exc_raise(exception); } else { rb_raise(rb_eRuntimeError, "Could not create entity"); } } return noko_xml_node_wrap(cNokogiriXmlEntityDecl, (xmlNodePtr)c_entity); }
#create_text_node(string, &block)
Create a Text
Node with string
#deconstruct_keys(array_of_names) → Hash)
Returns a hash describing the Document
, to use in pattern matching.
Valid keys and their values:
-
#root → (Node, nil) The root node of the
Document
, ornil
if the document is empty.
In the future, other keys may allow accessing things like doctype and processing instructions. If you have a use case and would like this functionality, please let us know by opening an issue or a discussion on the github project.
Example
doc = Nokogiri::XML.parse(<<~XML)
<?xml version="1.0"?>
<root>
<child>
</root>
XML
doc.deconstruct_keys([:root])
# => {:root=>
# #(Element:0x35c {
# name = "root",
# children = [
# #(Text "\n" + " "),
# #(Element:0x370 { name = "child", children = [ #(Text "\n")] }),
# #(Text "\n")]
# })}
Example of an empty document
doc = Nokogiri::XML::Document.new
doc.deconstruct_keys([:root])
# => {:root=>nil}
Since v1.14.0
# File 'lib/nokogiri/xml/document.rb', line 501
def deconstruct_keys(keys) { root: root } end
#decorate(node)
Apply any decorators to node
# File 'lib/nokogiri/xml/document.rb', line 409
def decorate(node) return unless @decorators @decorators.each do |klass, list| next unless node.is_a?(klass) list.each { |mod| node.extend(mod) } end end
#decorators(key)
Get the list of decorators given key
# File 'lib/nokogiri/xml/document.rb', line 368
def decorators(key) @decorators ||= {} @decorators[key] ||= [] end
#document
A reference to self
# File 'lib/nokogiri/xml/document.rb', line 321
def document self end
#dup(→ Nokogiri::XML::Document)
#dup(level) → Nokogiri::XML::Document)
Duplicate this node.
- Parameters
-
level
(optional Integer). 0 is a shallow copy, 1 (the default) is a deep copy.
- Returns
-
The new
Document
# File 'lib/nokogiri/xml/document.rb', line 207
def dup(level = 1) copy = OBJECT_DUP_METHOD.bind_call(self) copy.initialize_copy_with_args(self, level) end
#fragment(tags = nil)
Create a DocumentFragment
from tags
Returns an empty fragment if tags
is nil.
# File 'lib/nokogiri/xml/document.rb', line 429
def fragment( = nil) DocumentFragment.new(self, , root) end
#inspect_attributes (private)
[ GitHub ]# File 'lib/nokogiri/xml/document.rb', line 509
def inspect_attributes [:name, :children] end
#name
The name of this document. Always returns “document”
# File 'lib/nokogiri/xml/document.rb', line 316
def name "document" end
#namespaces
Get the hash of namespaces on the root Node
#remove_namespaces!
Remove all namespaces from all nodes in the document.
This could be useful for developers who either don’t understand namespaces or don’t care about them.
The following example shows a use case, and you can decide for yourself whether this is a good thing or not:
doc = Nokogiri::XML <<-EOXML
<root>
<car xmlns:part="http://general-motors.com/">
<part:tire>Michelin Model XGV</part:tire>
</car>
<bicycle xmlns:part="http://schwinn.com/">
<part:tire>I'm a bicycle tire!</part:tire>
</bicycle>
</root>
EOXML
doc.xpath("//tire").to_s # => ""
doc.xpath("//part:tire", "part" => "http://general-motors.com/").to_s # => "<part:tire>Michelin Model XGV</part:tire>"
doc.xpath("//part:tire", "part" => "http://schwinn.com/").to_s # => "<part:tire>I'm a bicycle tire!</part:tire>"
doc.remove_namespaces!
doc.xpath("//tire").to_s # => "<tire>Michelin Model XGV</tire><tire>I'm a bicycle tire!</tire>"
doc.xpath("//part:tire", "part" => "http://general-motors.com/").to_s # => ""
doc.xpath("//part:tire", "part" => "http://schwinn.com/").to_s # => ""
For more information on why this probably is not a good thing in general, please direct your browser to tenderlovemaking.com/2009/04/23/namespaces-in-xml.html
# File 'ext/nokogiri/xml_document.c', line 507
static VALUE remove_namespaces_bang(VALUE self) { xmlDocPtr doc = noko_xml_document_unwrap(self); recursively_remove_namespaces_from_node((xmlNodePtr)doc); return self; }
#slop!
Explore a document with shortcut methods. See Nokogiri::Slop
for details.
Note that any nodes that have been instantiated before #slop!
is called will not be decorated with sloppy behavior. So, if you’re in irb, the preferred idiom is:
irb> doc = Nokogiri::Slop my_markup
and not
irb> doc = Nokogiri::HTML my_markup
#... followed by irb's implicit inspect (and therefore instantiation of every node) ...
irb> doc.slop!
#... which does absolutely nothing.
# File 'lib/nokogiri/xml/document.rb', line 398
def slop! unless decorators(XML::Node).include?(Nokogiri::Decorators::Slop) decorators(XML::Node) << Nokogiri::Decorators::Slop decorate! end self end
#to_java() → Java::OrgW3cDom::Document) ⇒ ?
⚠ This method is only available when running JRuby.
Returns the underlying Java DOM document object for this document.
The returned Java object shares the same underlying data structure as this document, so changes in one are reflected in the other.
- Returns
-
Java::OrgW3cDom::Document (The class
Java::OrgW3cDom::Document
is also accessible asorg.w3c.dom.Document
.)
See also .wrap
# File 'lib/nokogiri/xml/document.rb', line 122
rdoc_method :method: to_java
#to_xml(*args, &block)
Alias for Node#serialize.
# File 'lib/nokogiri/xml/document.rb', line 419
alias_method :to_xml, :serialize
#url
Get the url name for this document.
# File 'ext/nokogiri/xml_document.c', line 234
static VALUE url(VALUE self) { xmlDocPtr doc = noko_xml_document_unwrap(self); if (doc->URL) { return NOKOGIRI_STR_NEW2(doc->URL); } return Qnil; }
#validate
# File 'lib/nokogiri/xml/document.rb', line 376
def validate return unless internal_subset internal_subset.validate(self) end
#version
Get the ::Nokogiri::XML
version for this Document
# File 'ext/nokogiri/xml_document.c', line 351
static VALUE version(VALUE self) { xmlDocPtr doc = noko_xml_document_unwrap(self); if (!doc->version) { return Qnil; } return NOKOGIRI_STR_NEW2(doc->version); }
#xpath_doctype() → Nokogiri::CSS::XPathVisitor::DoctypeConfig)
- Returns
-
The document type which determines CSS-to-XPath translation.
See XPathVisitor for more information.
# File 'lib/nokogiri/xml/document.rb', line 457
def xpath_doctype Nokogiri::CSS::XPathVisitor::DoctypeConfig::XML end