123456789_123456789_123456789_123456789_123456789_

DOM Navigation

Problem: Finding the previous, nearest Element of a certain type.

Solution: Using a recursive method to parse all elements regardless of being a sibling or a child of another sibling.

require 'rubygems'
require 'nokogiri'

parent = Nokogiri::HTML.parse(<YOUR HTML GOES HERE>).css('body').first

# start_here is given: A Nokogiri::XML::Element of the <div> with the id 'foo' and the class 'block'
@start_here = parent.at('div.block#foo')

# Search for previous element of kind "_style" starting from _start_element
def search_for_ previous_element(_start_element, _style)
  unless _start_element.nil?
    # have we already found what we're looking for?
    if _start_element.name == _style
      return _start_element
    end
    # _start_element is a div.block and not the _start_element itself
    if _start_element[:class] == "block" && _start_element[:id] != @start_here[:id]
      # begin recursion with last child inside div.block
      from_child = search_for_ previous_element(_start_element.children.last, _style)
      if(from_child)
        return from_child
      end
    end
    # begin recursion with previous element
    from_child = search_for_ previous_element(_start_element.previous, _style) 
    return from_child ? from_child : false
  else
    return false
  end
end

# A Nokogiri::XML::Element of the nearest, previous h1.
previous_element_h1 = search_for_previous_element(@start_here,"h1")

puts previous_element_h1

Automatic HTML Document Hierarchy

Problem: Given an HTML document like this...

  <p>Not sure how to start your day? Let us help!</p>

  <h1>1.0 Getting Started</h1>
  <p>Welcome!</p>

  <h2>1.1 First Things First</h2>
  <p>Get out of bed.</p>

  <h2>1.2 Get Dressed</h2>
  <p>Put on your clothes.</p>

  <h3>1.2.1 First, the undergarments</h3>
  <p>...and then the rest</p>

  <h1>2.0 Eating Breakfast</h1>
  <p>And so on, and so on...</p>

...wrap the content of each 'section' in <div class='section'>...</div> for hierarchical styling (e.g. with CSS such as div.section { margin-left:1em}). The end result looks like this:

  <p>Not sure how to start your day? Let us help!</p>

  <h1>1.0 Getting Started</h1>
  <div class='section'>
     <p>Welcome!</p>

     <h2>1.1 First Things First</h2>
     <div class='section'>
        <p>Get out of bed.</p>
     </div>

     <h2>1.2 Get Dressed</h2>
     <div class='section'>
        <p>Put on your clothes.</p>

        <h3>1.2.1 First, the undergarments</h3>
        <div class='section'>
          <p>...and then the rest</p>
        </div>
     </div>
  </div>

  <h1>2.0 Eating Breakfast</h1>
  <div class='section'>
    <p>And so on, and so on...</p>
  </div>

Solution: Use a stack while walking through the top level of the document, creating and inserting nodes as appropriate.

# Assuming doc is a Nokogiri::HTML::Document
if body = doc.css_at('body') then
  stack = []
  body.children.each do |node|
    # non-matching nodes will get level of 0
    level = node.name[ /h([1-6])/i, 1 ].to_i
    level = 99 if level == 0

    stack.pop while (top=stack.last) && top[:level]>=level
    stack.last[:div].add_child( node ) if stack.last
    if level<99
      div = Nokogiri::XML::Node.new('div',@nokodoc)
      div.set_attribute( 'class', 'section' )
      node.add_next_sibling(div)
      stack << { :div=>div, :level=>level }
    end
  end
end

Other Examples

Articles tagged Nokogiri on stackoverflow.com are another good resource for Nokogiri examples.