123456789_123456789_123456789_123456789_123456789_

Class: REXML::Document

Relationships & Source Files
Super Chains via Extension / Inclusion / Inheritance
Class Chain:
self, Element, Parent, Child
Instance Chain:
self, Element, Namespace, XMLTokens, Parent, Enumerable, Child, Node
Inherits: REXML::Element
Defined in: lib/rexml/document.rb

Overview

Represents an XML document.

A document may have:

  • A single child that may be accessed via method #root.

  • An XML declaration.

  • A document type.

  • Processing instructions.

In a Hurry?

If you’re somewhat familiar with XML and have a particular task in mind, you may want to see the tasks pages, and in particular, the tasks page for documents.

Constant Summary

XMLTokens - Included

NAME, NAMECHAR, NAME_CHAR, NAME_START_CHAR, NAME_STR, NCNAME_STR, NMTOKEN, NMTOKENS, REFERENCE

Namespace - Included

NAMESPLIT

Element - Inherited

UNDEFINED

Class Attribute Summary

Class Method Summary

Element - Inherited

.new

Returns a new REXML::Element object.

Parent - Inherited

.new

Constructor.

Child - Inherited

.new

Constructor.

Instance Attribute Summary

Element - Inherited

#attributes

Mechanisms for accessing attributes and child elements of this element.

#context

The context holds information about the processing environment, such as whitespace handling.

#elements

Mechanisms for accessing attributes and child elements of this element.

#has_attributes?

Returns true if the element has attributes, false otherwise:

#has_elements?

Returns true if the element has one or more element children, false otherwise:

#has_text?

Returns +true if the element has one or more text noded, false otherwise:

Namespace - Included

#expanded_name

The name of the object, valid if set.

#local_name

Alias for Namespace#name.

#name

The name of the object, valid if set.

#name=

Sets the name and the expanded name.

#prefix

The expanded name of the object, valid if name is set.

Parent - Inherited

Child - Inherited

#next_sibling
#next_sibling=

Sets the next sibling of this child.

#parent

The Parent of this object.

#parent=

Sets the parent of this child to the supplied argument.

#previous_sibling
#previous_sibling=

Sets the previous sibling of this child.

Node - Included

Instance Method Summary

Element - Inherited

#[]

With integer argument index given, returns the child at offset index, or nil if none:

#add_attribute

Adds an attribute to this element, overwriting any existing attribute by the same name.

#add_attributes

Adds zero or more attributes to the element; returns the argument.

#add_element

Adds a child element, optionally setting attributes on the added element; returns the added element.

#add_namespace

Adds a namespace to the element; returns self.

#add_text

Adds text to the element.

#attribute

Returns the string value for the given attribute name.

#cdatas

Returns a frozen array of the CData children of the element:

#clone

Returns a shallow copy of the element, containing the name and attributes, but not the parent or children:

#comments

Returns a frozen array of the Comment children of the element:

#delete_attribute

Removes a named attribute if it exists; returns the removed attribute if found, otherwise nil:

#delete_element

Deletes a child element.

#delete_namespace

Removes a namespace from the element.

#document

If the element is part of a document, returns that document:

#each_element

Calls the given block with each child element:

#each_element_with_attribute

Calls the given block with each child element that meets given criteria.

#each_element_with_text

Calls the given block with each child element that meets given criteria.

#get_elements

Returns an array of the elements that match the given xpath:

#get_text

Returns the first text node child in a specified element, if it exists, nil otherwise.

#ignore_whitespace_nodes

Returns true if whitespace nodes are ignored for the element.

#inspect

Returns a string representation of the element.

#instructions

Returns a frozen array of the Instruction children of the element:

#namespace

Returns the string namespace URI for the element, possibly deriving from one of its ancestors.

#namespaces

Returns a hash of all defined namespaces in the element and its ancestors:

#next_element

Returns the next sibling that is an element if it exists, niL otherwise:

#node_type

Returns symbol :element:

#prefixes

Returns an array of the string prefixes (names) of all defined namespaces in the element and its ancestors:

#previous_element

Returns the previous sibling that is an element if it exists, niL otherwise:

#raw

Returns true if raw mode is set for the element.

#root

Returns the most distant element (not document) ancestor of the element:

#root_node

Returns the most distant ancestor of self.

#text

Returns the text string from the first text node child in a specified element, if it exists, # nil otherwise.

#text=

Adds, replaces, or removes the first text node child in the element.

#texts

Returns a frozen array of the Text children of the element:

#whitespace

Returns true if whitespace is respected for this element, false otherwise.

#write

DEPRECATED See Formatters

#xpath

Returns the string xpath to the element relative to the most distant parent:

#__to_xpath_helper,
#each_with_something

A private helper method.

Namespace - Included

#fully_expanded_name

Fully expand the name, even if the prefix wasn’t specified in the source file.

#has_name?

Compares names optionally WITH namespaces.

Parent - Inherited

#<<

Alias for Parent#push.

#[]

Fetches a child at a given index.

#[]=

Set an index entry.

#add,
#children

Alias for Parent#to_a.

#deep_clone

Deeply clones this object.

#delete, #delete_at, #delete_if, #each,
#each_child

Alias for Parent#each.

#each_index,
#index

Fetches the index of a given child of this parent.

#insert_after

Inserts an child after another child child2 will be inserted after child1 in the child list of the parent.

#insert_before

Inserts an child before another child child2 will be inserted before child1 in the child list of the parent.

#length

Alias for Parent#size.

#push

Alias for Parent#add.

#replace_child

Replaces one child with another, making sure the nodelist is correct Child).

#size, #to_a, #unshift

Child - Inherited

#bytes

This doesn’t yet handle encodings.

#document
Returns

the document this child belongs to, or nil if this child belongs to no document.

#remove

Removes this child from the parent.

#replace_with

Replaces this object with another object.

Node - Included

#each_recursive

Visit all subnodes of self recursively.

#find_first_recursive

Find (and return) first subnode (recursively) for which the block evaluates to true.

#indent,
#index_in_parent

Returns the position that self holds in its parent’s array, indexed from 1.

#next_sibling_node, #previous_sibling_node,
#to_s
indent

Constructor Details

.new(string = nil, context = {}) ⇒ Document .new(io_stream = nil, context = {}) ⇒ Document .new(document = nil, context = {}) ⇒ Document

Returns a new REXML::Document object.

When no arguments are given, returns an empty document:

d = REXML::Document.new
d.to_s # => ""

When argument string is given, it must be a string containing a valid XML document:

xml_string = '<root><foo>Foo</foo><bar>Bar</bar></root>'
d = REXML::Document.new(xml_string)
d.to_s # => "<root><foo>Foo</foo><bar>Bar</bar></root>"

When argument io_stream is given, it must be an IO object that is opened for reading, and when read must return a valid XML document:

File.write('t.xml', xml_string)
d = File.open('t.xml', 'r') do |io|
  REXML::Document.new(io)
end
d.to_s # => "<root><foo>Foo</foo><bar>Bar</bar></root>"

When argument #document is given, it must be an existing document object, whose context and attributes (but not chidren) are cloned into the new document:

d = REXML::Document.new(xml_string)
d.children    # => [<root> ... </>]
d.context = {raw: :all, compress_whitespace: :all}
d.add_attributes({'bar' => 0, 'baz' => 1})
d1 = REXML::Document.new(d)
d1.children   # => []
d1.context    # => {:raw=>:all, :compress_whitespace=>:all}
d1.attributes # => {"bar"=>bar='0', "baz"=>baz='1'}

When argument context is given, it must be a hash containing context entries for the document; see {REXML::Element Context}:

context = {raw: :all, compress_whitespace: :all}
d = REXML::Document.new(xml_string, context)
d.context # => {:raw=>:all, :compress_whitespace=>:all}
[ GitHub ]

  
# File 'lib/rexml/document.rb', line 92

def initialize( source = nil, context = {} )
  @entity_expansion_count = 0
  super()
  @context = context
  return if source.nil?
  if source.kind_of? Document
    @context = source.context
    super source
  else
    build(  source )
  end
end

Class Attribute Details

.entity_expansion_limit (rw)

Get the entity expansion limit. By default the limit is set to 10000.

Deprecated. Use Security.entity_expansion_limit= instead.

[ GitHub ]

  
# File 'lib/rexml/document.rb', line 415

def Document::entity_expansion_limit
  return Security.entity_expansion_limit
end

.entity_expansion_limit=(val) (rw)

Set the entity expansion limit. By default the limit is set to 10000.

Deprecated. Use Security.entity_expansion_limit= instead.

[ GitHub ]

  
# File 'lib/rexml/document.rb', line 408

def Document::entity_expansion_limit=( val )
  Security.entity_expansion_limit = val
end

.entity_expansion_text_limit (rw)

Get the entity expansion limit. By default the limit is set to 10240.

Deprecated. Use Security.entity_expansion_text_limit instead.

[ GitHub ]

  
# File 'lib/rexml/document.rb', line 429

def Document::entity_expansion_text_limit
  return Security.entity_expansion_text_limit
end

.entity_expansion_text_limit=(val) (rw)

Set the entity expansion limit. By default the limit is set to 10240.

Deprecated. Use Security.entity_expansion_text_limit= instead.

[ GitHub ]

  
# File 'lib/rexml/document.rb', line 422

def Document::entity_expansion_text_limit=( val )
  Security.entity_expansion_text_limit = val
end

Class Method Details

.parse_stream(source, listener)

[ GitHub ]

  
# File 'lib/rexml/document.rb', line 401

def Document::parse_stream( source, listener )
  Parsers::StreamParser.new( source, listener ).parse
end

Instance Attribute Details

#entity_expansion_count (readonly)

[ GitHub ]

  
# File 'lib/rexml/document.rb', line 433

attr_reader :entity_expansion_count

#stand_alone?Boolean (readonly)

Returns the XMLDecl standalone value of the document as a string, if it has been set, otherwise the default standalone value:

d = REXML::Document.new('<?xml standalone="yes"?>')
d.stand_alone? # => "yes"
d = REXML::Document.new('')
d.stand_alone? # => nil
[ GitHub ]

  
# File 'lib/rexml/document.rb', line 305

def stand_alone?
  xml_decl().stand_alone?
end

Instance Method Details

#<<(child)

Alias for #add.

[ GitHub ]

  
# File 'lib/rexml/document.rb', line 201

alias :<< :add

#add(xml_decl) ⇒ self #add(doc_type) ⇒ self #add(object) ⇒ self
Also known as: #<<

Adds an object to the document; returns self.

When argument #xml_decl is given, it must be an XMLDecl object, which becomes the XML declaration for the document, replacing the previous XML declaration if any:

d = REXML::Document.new
d.xml_decl.to_s # => ""
d.add(REXML::XMLDecl.new('2.0'))
d.xml_decl.to_s # => "<?xml version='2.0'?>"

When argument doc_type is given, it must be an DocType object, which becomes the document type for the document, replacing the previous document type, if any:

d = REXML::Document.new
d.doctype.to_s # => ""
d.add(REXML::DocType.new('foo'))
d.doctype.to_s # => "<!DOCTYPE foo>"

When argument object (not an XMLDecl or DocType object) is given it is added as the last child:

d = REXML::Document.new
d.add(REXML::Element.new('foo'))
d.to_s # => "<foo/>"
[ GitHub ]

  
# File 'lib/rexml/document.rb', line 170

def add( child )
  if child.kind_of? XMLDecl
    if @children[0].kind_of? XMLDecl
      @children[0] = child
    else
      @children.unshift child
    end
    child.parent = self
  elsif child.kind_of? DocType
    # Find first Element or DocType node and insert the decl right
    # before it.  If there is no such node, just insert the child at the
    # end.  If there is a child and it is an DocType, then replace it.
    insert_before_index = @children.find_index { |x|
      x.kind_of?(Element) || x.kind_of?(DocType)
    }
    if insert_before_index # Not null = not end of list
      if @children[ insert_before_index ].kind_of? DocType
        @children[ insert_before_index ] = child
      else
        @children[ insert_before_index-1, 0 ] = child
      end
    else  # Insert at end of list
      @children << child
    end
    child.parent = self
  else
    rv = super
    raise "attempted adding second root element to document" if @elements.size > 1
    rv
  end
end

#add_element(name_or_element = nil, attributes = nil) ⇒ new_element

Adds an element to the document by calling REXML::Element.add_element:

REXML::Element.add_element(name_or_element, attributes)
[ GitHub ]

  
# File 'lib/rexml/document.rb', line 209

def add_element(arg=nil, arg2=nil)
  rv = super
  raise "attempted adding second root element to document" if @elements.size > 1
  rv
end

#build(source) (private)

[ GitHub ]

  
# File 'lib/rexml/document.rb', line 447

def build( source )
  Parsers::TreeParser.new( source, self ).parse
end

#cloneDocument

Returns the new document resulting from executing Document.new(self). See .new.

[ GitHub ]

  
# File 'lib/rexml/document.rb', line 120

def clone
  Document.new self
end

#doctypedoc_type?

Returns the DocType object for the document, if it exists, otherwise nil:

d = REXML::Document.new('<!DOCTYPE document SYSTEM "subjects.dtd">')
d.doctype.class # => REXML::DocType
d = REXML::Document.new('')
d.doctype.class # => nil
[ GitHub ]

  
# File 'lib/rexml/document.rb', line 241

def doctype
  @children.find { |item| item.kind_of? DocType }
end

#document

[ GitHub ]

  
# File 'lib/rexml/document.rb', line 442

def document
  self
end

#encodingencoding_string

Returns the XMLDecl encoding of the document, if it has been set, otherwise the default encoding:

d = REXML::Document.new('<?xml version="1.0" encoding="UTF-16"?>')
d.encoding # => "UTF-16"
d = REXML::Document.new('')
d.encoding # => "UTF-8"
[ GitHub ]

  
# File 'lib/rexml/document.rb', line 290

def encoding
  xml_decl().encoding
end

#expanded_nameempty_string Also known as: #name

Returns an empty string.

[ GitHub ]

  
# File 'lib/rexml/document.rb', line 129

def expanded_name
  ''
  #d = doc_type
  #d ? d.name : "UNDEFINED"
end

#name

Alias for #expanded_name.

[ GitHub ]

  
# File 'lib/rexml/document.rb', line 134

alias :name :expanded_name

#node_typeDocument

Returns the symbol :document.

[ GitHub ]

  
# File 'lib/rexml/document.rb', line 110

def node_type
  :document
end

#record_entity_expansion

[ GitHub ]

  
# File 'lib/rexml/document.rb', line 435

def record_entity_expansion
  @entity_expansion_count += 1
  if @entity_expansion_count > Security.entity_expansion_limit
    raise "number of entity expansions exceeded, processing aborted."
  end
end

#rootroot_element?

Returns the root element of the document, if it exists, otherwise nil:

d = REXML::Document.new('<root></root>')
d.root # => <root/>
d = REXML::Document.new('')
d.root # => nil
[ GitHub ]

  
# File 'lib/rexml/document.rb', line 225

def root
  elements[1]
  #self
  #@children.find { |item| item.kind_of? Element }
end

#versionversion_string

Returns the XMLDecl version of this document as a string, if it has been set, otherwise the default version:

d = REXML::Document.new('<?xml version="2.0" encoding="UTF-8"?>')
d.version # => "2.0"
d = REXML::Document.new('')
d.version # => "1.0"
[ GitHub ]

  
# File 'lib/rexml/document.rb', line 275

def version
  xml_decl().version
end

#write(output = $stdout, indent = -1, transtive = false, ie_hack = false, encoding = nil) #write(options={:output) ⇒ $stdout, :indent

Write the XML tree out, optionally with indent. This writes out the entire XML document, including XML declarations, doctype declarations, and processing instructions (if any are given).

A controversial point is whether Document should always write the XML declaration (<?xml version=‘1.0’?>) whether or not one is given by the user (or source document). ::REXML does not write one if one was not specified, because it adds unnecessary bandwidth to applications such as XML-RPC.

Accept Nth argument style and options Hash style as argument. The recommended style is options Hash style for one or more arguments case.

Examples

Document.new("<a><b/></a>").write

output = ""
Document.new("<a><b/></a>").write(output)

output = ""
Document.new("<a><b/></a>").write(:output => output, :indent => 2)

See also the classes in the rexml/formatters package for the proper way to change the default formatting of XML output.

Examples

output = ""
tr = Transitive.new
tr.write(Document.new("<a><b/></a>"), output)
output

output an object which supports ‘<< string’; this is where the document will be written.

indent

An integer. If -1, no indenting will be used; otherwise, the indentation will be twice this number of spaces, and children will be indented an additional amount. For a value of 3, every item will be indented 3 more levels, or 6 more spaces (2 * 3). Defaults to -1

transitive

If transitive is true and indent is >= 0, then the output will be pretty-printed in such a way that the added whitespace does not affect the absolute value of the document – that is, it leaves the value and number of Text nodes in the document unchanged.

ie_hack

This hack inserts a space before the /> on empty tags to address a limitation of Internet Explorer. Defaults to false

encoding

Encoding name as String. Change output encoding to specified encoding instead of encoding in XML declaration. Defaults to nil. It means encoding in XML declaration is used.

[ GitHub ]

  
# File 'lib/rexml/document.rb', line 365

def write(*arguments)
  if arguments.size == 1 and arguments[0].class == Hash
    options = arguments[0]

    output     = options[:output]
    indent     = options[:indent]
    transitive = options[:transitive]
    ie_hack    = options[:ie_hack]
    encoding   = options[:encoding]
  else
    output, indent, transitive, ie_hack, encoding, = *arguments
  end

  output   ||= $stdout
  indent   ||= -1
  transitive = false if transitive.nil?
  ie_hack    = false if ie_hack.nil?
  encoding ||= xml_decl.encoding

  if encoding != 'UTF-8' && !output.kind_of?(Output)
    output = Output.new( output, encoding )
  end
  formatter = if indent > -1
      if transitive
        require_relative "formatters/transitive"
        REXML::Formatters::Transitive.new( indent, ie_hack )
      else
        REXML::Formatters::Pretty.new( indent, ie_hack )
      end
    else
      REXML::Formatters::Default.new( ie_hack )
    end
  formatter.write( self, output )
end

#xml_decl ⇒ xml_decl

Returns the XMLDecl object for the document, if it exists, otherwise the default XMLDecl object:

d = REXML::Document.new('<?xml version="1.0" encoding="UTF-8"?>')
d.xml_decl.class # => REXML::XMLDecl
d.xml_decl.to_s  # => "<?xml version='1.0' encoding='UTF-8'?>"
d = REXML::Document.new('')
d.xml_decl.class # => REXML::XMLDecl
d.xml_decl.to_s  # => ""
[ GitHub ]

  
# File 'lib/rexml/document.rb', line 258

def xml_decl
  rv = @children[0]
  return rv if rv.kind_of? XMLDecl
  @children.unshift(XMLDecl.default)[0]
end