DOM Navigation
Problem: Finding the previous, nearest Element of a certain type.
Solution: Using a recursive method to parse all elements regardless of being a sibling or a child of another sibling.
require 'rubygems'
require 'nokogiri'
parent = Nokogiri::HTML.parse(<YOUR HTML GOES HERE>).css('body').first
# start_here is given: A Nokogiri::XML::Element of the <div> with the id 'foo' and the class 'block'
@start_here = parent.at('div.block#foo')
# Search for previous element of kind "_style" starting from _start_element
def search_for_ previous_element(_start_element, _style)
unless _start_element.nil?
# have we already found what we're looking for?
if _start_element.name == _style
return _start_element
end
# _start_element is a div.block and not the _start_element itself
if _start_element[:class] == "block" && _start_element[:id] != @start_here[:id]
# begin recursion with last child inside div.block
from_child = search_for_ previous_element(_start_element.children.last, _style)
if(from_child)
return from_child
end
end
# begin recursion with previous element
from_child = search_for_ previous_element(_start_element.previous, _style)
return from_child ? from_child : false
else
return false
end
end
# A Nokogiri::XML::Element of the nearest, previous h1.
previous_element_h1 = search_for_previous_element(@start_here,"h1")
puts previous_element_h1
Automatic HTML Document Hierarchy
Problem: Given an HTML document like this...
<p>Not sure how to start your day? Let us help!</p>
<h1>1.0 Getting Started</h1>
<p>Welcome!</p>
<h2>1.1 First Things First</h2>
<p>Get out of bed.</p>
<h2>1.2 Get Dressed</h2>
<p>Put on your clothes.</p>
<h3>1.2.1 First, the undergarments</h3>
<p>...and then the rest</p>
<h1>2.0 Eating Breakfast</h1>
<p>And so on, and so on...</p>
...wrap the content of each 'section' in <div class='section'>...</div>
for hierarchical styling (e.g. with CSS such as div.section { margin-left:1em}
). The end result looks like this:
<p>Not sure how to start your day? Let us help!</p>
<h1>1.0 Getting Started</h1>
<div class='section'>
<p>Welcome!</p>
<h2>1.1 First Things First</h2>
<div class='section'>
<p>Get out of bed.</p>
</div>
<h2>1.2 Get Dressed</h2>
<div class='section'>
<p>Put on your clothes.</p>
<h3>1.2.1 First, the undergarments</h3>
<div class='section'>
<p>...and then the rest</p>
</div>
</div>
</div>
<h1>2.0 Eating Breakfast</h1>
<div class='section'>
<p>And so on, and so on...</p>
</div>
Solution: Use a stack while walking through the top level of the document, creating and inserting nodes as appropriate.
# Assuming doc is a Nokogiri::HTML::Document
if body = doc.css_at('body') then
stack = []
body.children.each do |node|
# non-matching nodes will get level of 0
level = node.name[ /h([1-6])/i, 1 ].to_i
level = 99 if level == 0
stack.pop while (top=stack.last) && top[:level]>=level
stack.last[:div].add_child( node ) if stack.last
if level<99
div = Nokogiri::XML::Node.new('div',@nokodoc)
div.set_attribute( 'class', 'section' )
node.add_next_sibling(div)
stack << { :div=>div, :level=>level }
end
end
end
Other Examples
Articles tagged Nokogiri on stackoverflow.com are another good resource for Nokogiri examples.