123456789_123456789_123456789_123456789_123456789_

Class: Nokogiri::XML::Document

Relationships & Source Files
Extension / Inclusion / Inheritance Descendants
Subclasses:
Super Chains via Extension / Inclusion / Inheritance
Class Chain:
self, Node
Instance Chain:
Inherits: Nokogiri::XML::Node
Defined in: lib/nokogiri/xml/document.rb,
ext/nokogiri/xml_attr.c,
ext/nokogiri/xml_document.c

Overview

Document is the main entry point for dealing with ::Nokogiri::XML documents. The Document is created by parsing an ::Nokogiri::XML document. See .parse for more information on parsing.

For searching a Document, see Searchable#css and Searchable#xpath

Constant Summary

PP::Node - Included

COLLECTIONS

Searchable - Included

LOOKS_LIKE_XPATH

::Nokogiri::ClassResolver - Included

VALID_NAMESPACES

Node - Inherited

ATTRIBUTE_DECL, ATTRIBUTE_NODE, CDATA_SECTION_NODE, COMMENT_NODE, DECONSTRUCT_KEYS, DECONSTRUCT_METHODS, DOCB_DOCUMENT_NODE, DOCUMENT_FRAG_NODE, DOCUMENT_NODE, DOCUMENT_TYPE_NODE, DTD_NODE, ELEMENT_DECL, ELEMENT_NODE, ENTITY_DECL, ENTITY_NODE, ENTITY_REF_NODE, HTML_DOCUMENT_NODE, IMPLIED_XPATH_CONTEXTS, NAMESPACE_DECL, NOTATION_NODE, PI_NODE, TEXT_NODE, XINCLUDE_END, XINCLUDE_START

Class Method Summary

Node - Inherited

.new

documented in lib/nokogiri/xml/node.rb.

Instance Attribute Summary

  • #encoding rw

    Get the encoding for this Document.

  • #encoding=(encoding) rw

    Set the encoding string for this Document.

  • #errors rw

    The errors found while parsing a document.

  • #namespace_inheritance rw

    When true, reparented elements without a namespace will inherit their new parent’s namespace (if one exists).

  • #root rw

    Get the root node for this document.

  • #root= rw

    Set the root element on this document.

Node - Inherited

#blank?
Returns

true if the node is an empty or whitespace-only text or cdata node, else false.

#cdata?

Returns true if this is a CDATA.

#children

:category: Traversing Document Structure.

#children=

Set the content for this Node node_or_tags

#comment?

Returns true if this is a Comment.

#content

[Returns].

#content=

Set the content of this node to input.

#default_namespace=

Adds a default namespace supplied as a string #url href, to self.

#document

:category: Traversing Document Structure.

#document?

Returns true if this is a Document.

#elem?

Alias for Node#element?.

#element?

Returns true if this is an Element node.

#fragment?

Returns true if this is a DocumentFragment.

#html?

Returns true if this is an ::Nokogiri::HTML4::Document or ::Nokogiri::HTML5::Document node.

#inner_html

Get the inner_html for this node’s Node#children

#inner_html=

Set the content for this Node to node_or_tags.

#inner_text

Alias for Node#content.

#lang

Searches the language of a node, i.e.

#lang=

Set the language of a node, i.e.

#line
Returns

The line number of this Node.

#line=

Sets the line for this Node.

#name

Alias for Node#node_name.

#namespace
Returns

The Namespace of the element or attribute node, or nil if there is no namespace.

#namespace=

Set the default namespace on this node (as would be defined with an “xmlns=” attribute in ::Nokogiri::XML source), as a Namespace object ns.

#native_content=

Set the content of this node to input.

#next
#next=
#node_name

Returns the name for this Node.

#node_name=

Set the name for this Node.

#parent

Get the parent Node for this Node.

#parent=

Set the parent Node for this Node.

#previous
#previous=
#processing_instruction?

Returns true if this is a ProcessingInstruction node.

#read_only?

Is this a read only node?

#text

Alias for Node#content.

#text?

Returns true if this is a Text node.

#to_str

Alias for Node#content.

#xml?

Returns true if this is an Document node.

#prepend_newline?, #data_ptr?

Instance Method Summary

Node - Inherited

#<<

Add node_or_tags as a child of this Node.

#<=>

Compare two Node objects with respect to their Document.

#==

::Nokogiri::Test to see if this Node is equal to other

#[]

Fetch an attribute from this node.

#[]=

Update the attribute #name to value, or create the attribute if it does not exist.

#accept

Accept a visitor.

#add_child

Add node_or_tags as a child of this Node.

#add_class

Ensure HTML ::Nokogiri::CSS classes are present on self.

#add_namespace
#add_namespace_definition

:category: Manipulating Document Structure.

#add_next_sibling

Insert node_or_tags after this Node (as a sibling).

#add_previous_sibling

Insert node_or_tags before this Node (as a sibling).

#after

Insert node_or_tags after this node (as a sibling).

#ancestors

Get a list of ancestor Node for this Node.

#append_class

Add HTML ::Nokogiri::CSS classes to self, regardless of duplication.

#attr

Alias for Node#[].

#attribute

:category: Working With Node Attributes.

#attribute_nodes

:category: Working With Node Attributes.

#attribute_with_ns

:category: Working With Node Attributes.

#attributes

Fetch this node’s attributes.

#before

Insert node_or_tags before this node (as a sibling).

#canonicalize,
#child

:category: Traversing Document Structure.

#classes

Fetch CSS class names of a Node.

#clone

Clone this node.

#create_external_subset

Create an external subset.

#create_internal_subset

Create the internal subset of a document.

#css_path

Get the path to this node as a ::Nokogiri::CSS expression.

#deconstruct_keys

Returns a hash describing the Node, to use in pattern matching.

#decorate!

Decorate this node with the decorators set up in this node’s Document.

#delete
#description

Fetch the ::Nokogiri::HTML4::ElementDescription for this node.

#do_xinclude

Do xinclude substitution on the subtree below node.

#dup

Duplicate this node.

#each

Iterate over each attribute name and value pair for this Node.

#element_children

[Returns].

#elements
#encode_special_chars

Encode any special characters in string

#external_subset

Get the external subset.

#first_element_child
Returns

The first child Node that is an element.

#fragment

Create a DocumentFragment containing tags that is relative to this context node.

#get_attribute

Alias for Node#[].

#has_attribute?

Alias for Node#key?.

#initialize

Create a new node with #name that belongs to #document.

#internal_subset

Get the internal subset.

#key?

Returns true if attribute is set.

#keys

Get the attribute names for this Node.

#kwattr_add

Ensure that values are present in a keyword attribute.

#kwattr_append

Add keywords to a Node’s keyword attribute, regardless of duplication.

#kwattr_remove

Remove keywords from a keyword attribute.

#kwattr_values

Fetch values from a keyword attribute of a Node.

#last_element_child
Returns

The last child Node that is an element.

#matches?

Returns true if this Node matches selector

#namespace_definitions

[Returns].

#namespace_scopes
Returns

Array of all the Namespaces on this node and its ancestors.

#namespaced_key?

Returns true if attribute is set with namespace

#namespaces

Fetch all the namespaces on this node and its ancestors.

#next_element

Returns the next Element type sibling node.

#next_sibling

Returns the next sibling node.

#node_type

Get the type for this Node.

#parse

Parse string_or_io as a document fragment within the context of this node.

#path

Returns the path associated with this Node.

#pointer_id

[Returns].

#prepend_child

Add node_or_tags as the first child of this Node.

#previous_element

Returns the previous Element type sibling node.

#previous_sibling

Returns the previous sibling node.

#remove

Alias for Node#unlink.

#remove_attribute

Remove the attribute named #name

#remove_class

Remove HTML ::Nokogiri::CSS classes from this node.

#replace

Replace this Node with node_or_tags.

#serialize

Serialize Node using options.

#set_attribute

Alias for Node#[]=.

#swap

Swap this Node for node_or_tags

#to_html

Serialize this Node to ::Nokogiri::HTML.

#to_s

Turn this node in to a string.

#to_xhtml

Serialize this Node to XHTML using options

#to_xml

Serialize this Node to ::Nokogiri::XML using options

#traverse

Yields self and all children to block recursively.

#type

Alias for Node#node_type.

#unlink

Unlink this node from its current context.

#value?

Does this Node’s attributes include <value>.

#values

Get the attribute values for this Node.

#wrap

Wrap this Node with the node parsed from markup or a dup of the node.

#write_html_to

Write Node as ::Nokogiri::HTML to io with options

#write_to

Serialize this node or document to io.

#write_xhtml_to

Write Node as XHTML to io with options

#write_xml_to

Write Node as ::Nokogiri::XML to io with options

#add_child_node_and_reparent_attrs, #add_sibling,
#compare

Compare this Node to other with respect to their Document.

#dump_html

Returns the Node as html.

#get

Get the value for attribute

#html_standard_serialize,
#in_context

TODO: DOCUMENT ME.

#inspect_attributes, #keywordify,
#native_write_to

Write this Node to io with #encoding and options

#process_xincludes

Loads and substitutes all xinclude elements below the node.

#set

Set the property to value

#set_namespace

Set the namespace to namespace

#to_format, #write_format_to, #add_child_node, #add_next_sibling_node, #add_previous_sibling_node, #replace_node

::Nokogiri::ClassResolver - Included

#related_class

Find a class constant within the.

Searchable - Included

#%

Alias for Searchable#at.

#/
#>

Search this node’s immediate children using ::Nokogiri::CSS selector selector

#at

Search this object for paths, and return only the first result.

#at_css

Search this object for ::Nokogiri::CSS rules, and return only the first match.

#at_xpath

Search this node for XPath paths, and return only the first match.

#css

Search this object for ::Nokogiri::CSS rules.

#search

Search this object for paths.

#xpath

Search this node for XPath paths.

#css_internal, #css_rules_to_xpath, #xpath_impl, #xpath_internal, #xpath_query_from_css_rule, #extract_params

PP::Node - Included

Constructor Details

.new(document, name)

Alias for Comment.new. Create a new Attr element on the #document with #name

#initialize(*args) ⇒ Document

This method is for internal use only.

rubocop:disable Lint/MissingSuper

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 189

def initialize(*args) # :nodoc: # rubocop:disable Lint/MissingSuper
  @errors     = []
  @decorators = nil
  @namespace_inheritance = false
end

Class Method Details

.empty_doc?(string_or_io) ⇒ Boolean (private)

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 95

def empty_doc?(string_or_io)
  string_or_io.nil? ||
    (string_or_io.respond_to?(:empty?) && string_or_io.empty?) ||
    (string_or_io.respond_to?(:eof?) && string_or_io.eof?)
end

.parse(input, url: nil, encoding: nil, options: DEFAULT_XML) {|options| ... } ⇒ Document

Parse XML input from a String or IO object, and return a new Document object.

By default, ::Nokogiri treats documents as untrusted, and so does not attempt to load DTDs or access the network. See ParseOptions for a complete list of options; and that module’s DEFAULT_XML constant for what’s set (and not set) by default.

See also: ::Nokogiri.XML() which is a convenience method which will call this method.

Parameters
  • input (String, IO) The content to be parsed.

Keyword arguments
  • url: (String) The URI where this document is located.

  • encoding: (String) The name of the encoding that should be used when processing the document. (default nil means that the encoding will be determined based on the document content)

  • options (Nokogiri::XML::ParseOptions) Configuration object that determines some behaviors during parsing, such as Nokogiri::XML::ParseOptions::RECOVER. See the Nokogiri::XML::ParseOptions for more information.

Yields

If a block is given, a Nokogiri::XML::ParseOptions object is yielded to the block which can be configured before parsing. See Nokogiri::XML::ParseOptions for more information.

Yields:

  • (options)
[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 56

def parse(
  string_or_io,
  url_ = nil, encoding_ = nil, options_ = XML::ParseOptions::DEFAULT_XML,
  url: url_, encoding: encoding_, options: options_
)
  options = Nokogiri::XML::ParseOptions.new(options) if Integer === options
  yield options if block_given?

  url ||= string_or_io.respond_to?(:path) ? string_or_io.path : nil

  if empty_doc?(string_or_io)
    if options.strict?
      raise Nokogiri::XML::SyntaxError, "Empty document"
    else
      return encoding ? new.tap { |i| i.encoding = encoding } : new
    end
  end

  doc = if string_or_io.respond_to?(:read)
    if string_or_io.is_a?(Pathname)
      # resolve the Pathname to the file and open it as an IO object, see #2110
      string_or_io = string_or_io.expand_path.open
      url ||= string_or_io.path
    end

    read_io(string_or_io, url, encoding, options.to_i)
  else
    # read_memory pukes on empty docs
    read_memory(string_or_io, url, encoding, options.to_i)
  end

  # do xinclude processing
  doc.do_xinclude(options) if options.xinclude?

  doc
end

.read_io(io, url, encoding, options)

Create a new document from an IO object

[ GitHub ]

  
# File 'ext/nokogiri/xml_document.c', line 366

static VALUE
noko_xml_document_s_read_io(VALUE rb_class,
                            VALUE rb_io,
                            VALUE rb_url,
                            VALUE rb_encoding,
                            VALUE rb_options)
{
  libxmlStructuredErrorHandlerState handler_state;
  VALUE rb_errors = rb_ary_new();

  noko__structured_error_func_save_and_set(&handler_state, (void *)rb_errors, noko__error_array_pusher);

  const char *c_url    = NIL_P(rb_url)      ? NULL : StringValueCStr(rb_url);
  const char *c_enc    = NIL_P(rb_encoding) ? NULL : StringValueCStr(rb_encoding);
  xmlDocPtr c_document = xmlReadIO(
                           (xmlInputReadCallback)noko_io_read,
                           (xmlInputCloseCallback)noko_io_close,
                           (void *)rb_io,
                           c_url,
                           c_enc,
                           (int)NUM2INT(rb_options)
                         );

  noko__structured_error_func_restore(&handler_state);

  if (c_document == NULL) {
    xmlFreeDoc(c_document);

    VALUE exception = rb_funcall(cNokogiriXmlSyntaxError, rb_intern("aggregate"), 1, rb_errors);
    if (RB_TEST(exception)) {
      rb_exc_raise(exception);
    } else {
      rb_raise(rb_eRuntimeError, "Could not parse document");
    }
  }

  VALUE rb_document = noko_xml_document_wrap(rb_class, c_document);
  rb_iv_set(rb_document, "@errors", rb_errors);
  return rb_document;
}

.read_memory(string, url, encoding, options)

Create a new document from a String

[ GitHub ]

  
# File 'ext/nokogiri/xml_document.c', line 413

static VALUE
noko_xml_document_s_read_memory(VALUE rb_class,
                                VALUE rb_input,
                                VALUE rb_url,
                                VALUE rb_encoding,
                                VALUE rb_options)
{
  VALUE rb_errors = rb_ary_new();
  xmlSetStructuredErrorFunc((void *)rb_errors, noko__error_array_pusher);

  const char *c_buffer = StringValuePtr(rb_input);
  const char *c_url    = NIL_P(rb_url)      ? NULL : StringValueCStr(rb_url);
  const char *c_enc    = NIL_P(rb_encoding) ? NULL : StringValueCStr(rb_encoding);
  int c_buffer_len     = (int)RSTRING_LEN(rb_input);
  xmlDocPtr c_document = xmlReadMemory(c_buffer, c_buffer_len, c_url, c_enc, (int)NUM2INT(rb_options));

  xmlSetStructuredErrorFunc(NULL, NULL);

  if (c_document == NULL) {
    VALUE exception = rb_funcall(cNokogiriXmlSyntaxError, rb_intern("aggregate"), 1, rb_errors);
    if (RB_TEST(exception)) {
      rb_exc_raise(exception);
    } else {
      rb_raise(rb_eRuntimeError, "Could not parse document");
    }
  }

  VALUE document = noko_xml_document_wrap(rb_class, c_document);
  rb_iv_set(document, "@errors", rb_errors);
  return document;
}

.wrap(java_document) → Nokogiri::XML::Document)

⚠ This method is only available when running JRuby.

Create a Document using an existing Java DOM document object.

The returned Document shares the same underlying data structure as the Java object, so changes in one are reflected in the other.

Parameters
  • java_document (Java::OrgW3cDom::Document) (The class Java::OrgW3cDom::Document is also accessible as org.w3c.dom.Document.)

Returns

Document

See also #to_java

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 102

RDoc directive :singleton-method: wrap

Instance Attribute Details

#encoding (rw)

Get the encoding for this Document

[ GitHub ]

  
# File 'ext/nokogiri/xml_document.c', line 336

static VALUE
encoding(VALUE self)
{
  xmlDocPtr doc = noko_xml_document_unwrap(self);

  if (!doc->encoding) { return Qnil; }
  return NOKOGIRI_STR_NEW2(doc->encoding);
}

#encoding=(encoding) (rw)

Set the encoding string for this Document

[ GitHub ]

  
# File 'ext/nokogiri/xml_document.c', line 316

static VALUE
set_encoding(VALUE self, VALUE encoding)
{
  xmlDocPtr doc = noko_xml_document_unwrap(self);

  if (doc->encoding) {
    xmlFree(DISCARD_CONST_QUAL_XMLCHAR(doc->encoding));
  }

  doc->encoding = xmlStrdup((xmlChar *)StringValueCStr(encoding));

  return encoding;
}

#errors (rw)

The errors found while parsing a document.

Returns

Array<Nokogiri::XML::SyntaxError>

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 140

attr_accessor :errors

#namespace_inheritance (rw)

When true, reparented elements without a namespace will inherit their new parent’s namespace (if one exists). Defaults to false.

Returns

Boolean

Example: Default behavior of namespace inheritance

xml = <<~EOF
        <root xmlns:foo="http://nokogiri.org/default_ns/test/foo">
          <foo:parent>
          </foo:parent>
        </root>
      EOF
doc = Nokogiri::XML(xml)
parent = doc.at_xpath("//foo:parent", "foo" => "http://nokogiri.org/default_ns/test/foo")
parent.add_child("<child></child>")
doc.to_xml
# => <?xml version="1.0"?>
#    <root xmlns:foo="http://nokogiri.org/default_ns/test/foo">
#      <foo:parent>
#        <child/>
#      </foo:parent>
#    </root>

Example: Setting namespace inheritance to true

xml = <<~EOF
        <root xmlns:foo="http://nokogiri.org/default_ns/test/foo">
          <foo:parent>
          </foo:parent>
        </root>
      EOF
doc = Nokogiri::XML(xml)
doc.namespace_inheritance = true
parent = doc.at_xpath("//foo:parent", "foo" => "http://nokogiri.org/default_ns/test/foo")
parent.add_child("<child></child>")
doc.to_xml
# => <?xml version="1.0"?>
#    <root xmlns:foo="http://nokogiri.org/default_ns/test/foo">
#      <foo:parent>
#        <foo:child/>
#      </foo:parent>
#    </root>

Since v1.12.4

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 187

attr_accessor :namespace_inheritance

#root (rw)

Get the root node for this document.

[ GitHub ]

  
# File 'ext/nokogiri/xml_document.c', line 294

static VALUE
rb_xml_document_root(VALUE self)
{
  xmlDocPtr c_document;
  xmlNodePtr c_root;

  c_document = noko_xml_document_unwrap(self);

  c_root = xmlDocGetRootElement(c_document);
  if (!c_root) {
    return Qnil;
  }

  return noko_xml_node_wrap(Qnil, c_root) ;
}

#root= (rw)

Set the root element on this document

[ GitHub ]

  
# File 'ext/nokogiri/xml_document.c', line 250

static VALUE
rb_xml_document_root_set(VALUE self, VALUE rb_new_root)
{
  xmlDocPtr c_document;
  xmlNodePtr c_new_root = NULL, c_current_root;

  c_document = noko_xml_document_unwrap(self);

  c_current_root = xmlDocGetRootElement(c_document);
  if (c_current_root) {
    xmlUnlinkNode(c_current_root);
    noko_xml_document_pin_node(c_current_root);
  }

  if (!NIL_P(rb_new_root)) {
    if (!rb_obj_is_kind_of(rb_new_root, cNokogiriXmlNode)) {
      rb_raise(rb_eArgError,
               "expected Nokogiri::XML::Node but received %"PRIsVALUE,
               rb_obj_class(rb_new_root));
    }

    Noko_Node_Get_Struct(rb_new_root, xmlNode, c_new_root);

    /* If the new root's document is not the same as the current document,
     * then we need to dup the node in to this document. */
    if (c_new_root->doc != c_document) {
      c_new_root = xmlDocCopyNode(c_new_root, c_document, 1);
      if (!c_new_root) {
        rb_raise(rb_eRuntimeError, "Could not reparent node (xmlDocCopyNode)");
      }
    }
  }

  xmlDocSetRootElement(c_document, c_new_root);

  return rb_new_root;
}

Instance Method Details

#<<(node_or_tags)

Alias for #add_child.

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 448

alias_method :<<, :add_child

#add_child(node_or_tags) Also known as: #<<

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 436

def add_child(node_or_tags)
  raise "A document may not have multiple root nodes." if (root && root.name != "nokogiri_text_wrapper") && !(node_or_tags.comment? || node_or_tags.processing_instruction?)

  node_or_tags = coerce(node_or_tags)
  if node_or_tags.is_a?(XML::NodeSet)
    raise "A document may not have multiple root nodes." if node_or_tags.size > 1

    super(node_or_tags.first)
  else
    super
  end
end

#canonicalize(mode = XML_C14N_1_0, inclusive_namespaces = nil, with_comments = false) #canonicalize {|obj, parent| ... }

Canonicalize a document and return the results. Takes an optional block that takes two parameters: the obj and that node’s parent. The obj will be either a Node, or a Namespace The block must return a non-nil, non-false value if the obj passed in should be included in the canonicalized document.

[ GitHub ]

  
# File 'ext/nokogiri/xml_document.c', line 596

static VALUE
rb_xml_document_canonicalize(int argc, VALUE *argv, VALUE self)
{
  VALUE rb_mode;
  VALUE rb_namespaces;
  VALUE rb_comments_p;
  int c_mode = 0;
  xmlChar **c_namespaces;

  xmlDocPtr c_doc;
  xmlOutputBufferPtr c_obuf;
  xmlC14NIsVisibleCallback c_callback_wrapper = NULL;
  void *rb_callback = NULL;

  VALUE rb_cStringIO;
  VALUE rb_io;

  rb_scan_args(argc, argv, "03", &rb_mode, &rb_namespaces, &rb_comments_p);
  if (!NIL_P(rb_mode)) {
    Check_Type(rb_mode, T_FIXNUM);
    c_mode = NUM2INT(rb_mode);
  }
  if (!NIL_P(rb_namespaces)) {
    Check_Type(rb_namespaces, T_ARRAY);
    if (c_mode == XML_C14N_1_0 || c_mode == XML_C14N_1_1) {
      rb_raise(rb_eRuntimeError, "This canonicalizer does not support this operation");
    }
  }

  c_doc = noko_xml_document_unwrap(self);

  rb_cStringIO = rb_const_get_at(rb_cObject, rb_intern("StringIO"));
  rb_io = rb_class_new_instance(0, 0, rb_cStringIO);
  c_obuf = xmlAllocOutputBuffer(NULL);

  c_obuf->writecallback = (xmlOutputWriteCallback)noko_io_write;
  c_obuf->closecallback = (xmlOutputCloseCallback)noko_io_close;
  c_obuf->context = (void *)rb_io;

  if (rb_block_given_p()) {
    c_callback_wrapper = block_caller;
    rb_callback = (void *)rb_block_proc();
  }

  if (NIL_P(rb_namespaces)) {
    c_namespaces = NULL;
  } else {
    long ns_len = RARRAY_LEN(rb_namespaces);
    c_namespaces = ruby_xcalloc((size_t)ns_len + 1, sizeof(xmlChar *));
    for (int j = 0 ; j < ns_len ; j++) {
      VALUE entry = rb_ary_entry(rb_namespaces, j);
      c_namespaces[j] = (xmlChar *)StringValueCStr(entry);
    }
  }

  xmlC14NExecute(c_doc, c_callback_wrapper, rb_callback,
                 c_mode,
                 c_namespaces,
                 (int)RTEST(rb_comments_p),
                 c_obuf);

  ruby_xfree(c_namespaces);
  xmlOutputBufferClose(c_obuf);

  return rb_funcall(rb_io, rb_intern("string"), 0);
}

#clone(→ Nokogiri::XML::Document) #clone(level) → Nokogiri::XML::Document)

Clone this node.

Parameters
  • level (optional Integer). 0 is a shallow copy, 1 (the default) is a deep copy.

Returns

The new Document

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 222

def clone(level = 1)
  copy = OBJECT_CLONE_METHOD.bind_call(self)
  copy.initialize_copy_with_args(self, level)
end

#collect_namespaces() → Hash<String(Namespace#prefix) ⇒ String(Namespace#href)>)

Recursively get all namespaces from this node and its subtree and return them as a hash.

⚠ This method will not handle duplicate namespace prefixes, since the return value is a hash.

Note that this method does an xpath lookup for nodes with namespaces, and as a result the order (and which duplicate prefix “wins”) may be dependent on the implementation of the underlying ::Nokogiri::XML library.

Example: Basic usage

Given this document:

<root xmlns="default" xmlns:foo="bar">
  <bar xmlns:hello="world" />
</root>

This method will return:

{"xmlns:foo"=>"bar", "xmlns"=>"default", "xmlns:hello"=>"world"}

Example: Duplicate prefixes

Given this document:

<root xmlns:foo="bar">
  <bar xmlns:foo="baz" />
</root>

The hash returned will be something like:

{"xmlns:foo" => "baz"}
[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 360

def collect_namespaces
  xpath("//namespace::*").each_with_object({}) do |ns, hash|
    hash[["xmlns", ns.prefix].compact.join(":")] = ns.href if ns.prefix != "xml"
  end
end

#create_cdata(string, &block)

Create a CDATA Node containing string

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 305

def create_cdata(string, &block)
  Nokogiri::XML::CDATA.new(self, string.to_s, &block)
end

#create_comment(string, &block)

Create a Comment Node containing string

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 310

def create_comment(string, &block)
  Nokogiri::XML::Comment.new(self, string.to_s, &block)
end

#create_element(name, *contents_or_attrs, &block) → Nokogiri::XML::Element)

Create a new Element with #name belonging to this document, optionally setting contents or attributes.

This method is not the most user-friendly option if your intention is to add a node to the document tree. Prefer one of the Node methods like Node#add_child, Node#add_next_sibling, Node#replace, etc. which will both create an element (or subtree) and place it in the document tree.

Arguments may be passed to initialize the element:

  • a Hash argument will be used to set attributes

  • a non-Hash object that responds to #to_s will be used to set the new node’s contents

A block may be passed to mutate the node.

Parameters
  • #name (String)

  • contents_or_attrs (#to_s, Hash)

Yields

node (Nokogiri::XML::Element)

Returns

Element

Example: An empty element without attributes

doc.create_element("div")
# => <div></div>

Example: An element with contents

doc.create_element("div", "contents")
# => <div>contents</div>

Example: An element with attributes

doc.create_element("div", {"class" => "container"})
# => <div class='container'></div>

Example: An element with contents and attributes

doc.create_element("div", "contents", {"class" => "container"})
# => <div class='container'>contents</div>

Example: Passing a block to mutate the element

doc.create_element("div") { |node| node["class"] = "blue" if before_noon? }
[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 275

def create_element(name, *contents_or_attrs, &block)
  elm = Nokogiri::XML::Element.new(name, self, &block)
  contents_or_attrs.each do |arg|
    case arg
    when Hash
      arg.each do |k, v|
        key = k.to_s
        if key =~ NCNAME_RE
          ns_name = Regexp.last_match(1)
          elm.add_namespace_definition(ns_name, v)
        else
          elm[k.to_s] = v.to_s
        end
      end
    else
      elm.content = arg
    end
  end
  if (ns = elm.namespace_definitions.find { |n| n.prefix.nil? || (n.prefix == "") })
    elm.namespace = ns
  end
  elm
end

#create_entity(name, type, external_id, system_id, content)

Create a new entity named #name.

type is an integer representing the type of entity to be created, and it defaults to Nokogiri::XML::EntityDecl::INTERNAL_GENERAL. See the constants on EntityDecl for more information.

external_id, system_id, and content set the External ID, System ID, and content respectively. All of these parameters are optional.

[ GitHub ]

  
# File 'ext/nokogiri/xml_document.c', line 524

static VALUE
noko_xml_document__create_entity(int argc, VALUE *argv, VALUE rb_document)
{
  VALUE rb_name;
  VALUE rb_type;
  VALUE rb_ext_id;
  VALUE rb_sys_id;
  VALUE rb_content;

  rb_scan_args(argc, argv, "14",
               &rb_name,
               &rb_type, &rb_ext_id, &rb_sys_id, &rb_content);

  xmlDocPtr c_document = noko_xml_document_unwrap(rb_document);

  libxmlStructuredErrorHandlerState handler_state;
  VALUE rb_errors = rb_ary_new();
  noko__structured_error_func_save_and_set(&handler_state, (void *)rb_errors, noko__error_array_pusher);

  xmlEntityPtr c_entity = xmlAddDocEntity(
                            c_document,
                            (xmlChar *)(NIL_P(rb_name) ? NULL : StringValueCStr(rb_name)),
                            (int)(NIL_P(rb_type) ? XML_INTERNAL_GENERAL_ENTITY : NUM2INT(rb_type)),
                            (xmlChar *)(NIL_P(rb_ext_id) ? NULL : StringValueCStr(rb_ext_id)),
                            (xmlChar *)(NIL_P(rb_sys_id) ? NULL : StringValueCStr(rb_sys_id)),
                            (xmlChar *)(NIL_P(rb_content) ? NULL : StringValueCStr(rb_content))
                          );

  noko__structured_error_func_restore(&handler_state);

  if (NULL == c_entity) {
    VALUE exception = rb_funcall(cNokogiriXmlSyntaxError, rb_intern("aggregate"), 1, rb_errors);
    if (RB_TEST(exception)) {
      rb_exc_raise(exception);
    } else {
      rb_raise(rb_eRuntimeError, "Could not create entity");
    }
  }

  return noko_xml_node_wrap(cNokogiriXmlEntityDecl, (xmlNodePtr)c_entity);
}

#create_text_node(string, &block)

Create a Text Node with string

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 300

def create_text_node(string, &block)
  Nokogiri::XML::Text.new(string.to_s, self, &block)
end

#deconstruct_keys(array_of_names) → Hash)

Returns a hash describing the Document, to use in pattern matching.

Valid keys and their values:

  • #root → (Node, nil) The root node of the Document, or nil if the document is empty.

In the future, other keys may allow accessing things like doctype and processing instructions. If you have a use case and would like this functionality, please let us know by opening an issue or a discussion on the github project.

Example

doc = Nokogiri::XML.parse(<<~XML)
  <?xml version="1.0"?>
  <root>
    <child>
  </root>
XML

doc.deconstruct_keys([:root])
# => {:root=>
#      #(Element:0x35c {
#        name = "root",
#        children = [
#          #(Text "\n" + "  "),
#          #(Element:0x370 { name = "child", children = [ #(Text "\n")] }),
#          #(Text "\n")]
#        })}

Example of an empty document

doc = Nokogiri::XML::Document.new

doc.deconstruct_keys([:root])
# => {:root=>nil}

Since v1.14.0

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 500

def deconstruct_keys(keys)
  { root: root }
end

#decorate(node)

Apply any decorators to node

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 408

def decorate(node)
  return unless @decorators

  @decorators.each do |klass, list|
    next unless node.is_a?(klass)

    list.each { |mod| node.extend(mod) }
  end
end

#decorators(key)

Get the list of decorators given key

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 367

def decorators(key)
  @decorators ||= {}
  @decorators[key] ||= []
end

#document

A reference to self

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 320

def document
  self
end

#dup(→ Nokogiri::XML::Document) #dup(level) → Nokogiri::XML::Document)

Duplicate this node.

Parameters
  • level (optional Integer). 0 is a shallow copy, 1 (the default) is a deep copy.

Returns

The new Document

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 206

def dup(level = 1)
  copy = OBJECT_DUP_METHOD.bind_call(self)
  copy.initialize_copy_with_args(self, level)
end

#fragment(tags = nil)

Create a DocumentFragment from tags Returns an empty fragment if tags is nil.

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 428

def fragment(tags = nil)
  DocumentFragment.new(self, tags, root)
end

#inspect_attributes (private)

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 508

def inspect_attributes
  [:name, :children]
end

#name

The name of this document. Always returns “document”

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 315

def name
  "document"
end

#namespaces

Get the hash of namespaces on the root Node

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 421

def namespaces
  root ? root.namespaces : {}
end

#remove_namespaces!

Remove all namespaces from all nodes in the document.

This could be useful for developers who either don’t understand namespaces or don’t care about them.

The following example shows a use case, and you can decide for yourself whether this is a good thing or not:

doc = Nokogiri::XML <<-EOXML
   <root>
     <car xmlns:part="http://general-motors.com/">
       <part:tire>Michelin Model XGV</part:tire>
     </car>
     <bicycle xmlns:part="http://schwinn.com/">
       <part:tire>I'm a bicycle tire!</part:tire>
     </bicycle>
   </root>
   EOXML

doc.xpath("//tire").to_s # => ""
doc.xpath("//part:tire", "part" => "http://general-motors.com/").to_s # => "<part:tire>Michelin Model XGV</part:tire>"
doc.xpath("//part:tire", "part" => "http://schwinn.com/").to_s # => "<part:tire>I'm a bicycle tire!</part:tire>"

doc.remove_namespaces!

doc.xpath("//tire").to_s # => "<tire>Michelin Model XGV</tire><tire>I'm a bicycle tire!</tire>"
doc.xpath("//part:tire", "part" => "http://general-motors.com/").to_s # => ""
doc.xpath("//part:tire", "part" => "http://schwinn.com/").to_s # => ""

For more information on why this probably is not a good thing in general, please direct your browser to tenderlovemaking.com/2009/04/23/namespaces-in-xml.html

[ GitHub ]

  
# File 'ext/nokogiri/xml_document.c', line 503

static VALUE
remove_namespaces_bang(VALUE self)
{
  xmlDocPtr doc = noko_xml_document_unwrap(self);

  recursively_remove_namespaces_from_node((xmlNodePtr)doc);
  return self;
}

#slop!

Explore a document with shortcut methods. See Nokogiri::Slop for details.

Note that any nodes that have been instantiated before #slop! is called will not be decorated with sloppy behavior. So, if you’re in irb, the preferred idiom is:

irb> doc = Nokogiri::Slop my_markup

and not

irb> doc = Nokogiri::HTML my_markup
#... followed by irb's implicit inspect (and therefore instantiation of every node) ...
irb> doc.slop!
#... which does absolutely nothing.
[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 397

def slop!
  unless decorators(XML::Node).include?(Nokogiri::Decorators::Slop)
    decorators(XML::Node) << Nokogiri::Decorators::Slop
    decorate!
  end

  self
end

#to_java() → Java::OrgW3cDom::Document) ⇒ ?

⚠ This method is only available when running JRuby.

Returns the underlying Java DOM document object for this document.

The returned Java object shares the same underlying data structure as this document, so changes in one are reflected in the other.

Returns

Java::OrgW3cDom::Document (The class Java::OrgW3cDom::Document is also accessible as org.w3c.dom.Document.)

See also .wrap

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 121

rdoc_method :method: to_java

#to_xml(*args, &block)

Alias for Node#serialize.

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 418

alias_method :to_xml, :serialize

#url

Get the url name for this document.

[ GitHub ]

  
# File 'ext/nokogiri/xml_document.c', line 234

static VALUE
url(VALUE self)
{
  xmlDocPtr doc = noko_xml_document_unwrap(self);

  if (doc->URL) { return NOKOGIRI_STR_NEW2(doc->URL); }

  return Qnil;
}

#validate

Validate this Document against its DTD. Returns a list of errors on the document or nil when there is no DTD.

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 375

def validate
  return unless internal_subset

  internal_subset.validate(self)
end

#version

Get the ::Nokogiri::XML version for this Document

[ GitHub ]

  
# File 'ext/nokogiri/xml_document.c', line 351

static VALUE
version(VALUE self)
{
  xmlDocPtr doc = noko_xml_document_unwrap(self);

  if (!doc->version) { return Qnil; }
  return NOKOGIRI_STR_NEW2(doc->version);
}

#xpath_doctype() → Nokogiri::CSS::XPathVisitor::DoctypeConfig)

Returns

The document type which determines CSS-to-XPath translation.

See XPathVisitor for more information.

[ GitHub ]

  
# File 'lib/nokogiri/xml/document.rb', line 456

def xpath_doctype
  Nokogiri::CSS::XPathVisitor::DoctypeConfig::XML
end