Module: Nokogiri
Overview
Nokogiri
parses and searches XML/HTML very quickly, and also has correctly implemented CSS3 selector support as well as XPath 1.0 support.
Parsing a document returns either a ::Nokogiri::XML::Document
, or a ::Nokogiri::HTML4::Document
depending on the kind of document you parse.
Here is an example:
require 'nokogiri'
require 'open-uri'
# Get a Nokogiri::HTML4::Document for the page we’re interested in...
doc = Nokogiri::HTML4(URI.open('http://www.google.com/search?q=tenderlove'))
# Do funky things with it using Nokogiri::XML::Node methods...
####
# Search for nodes by css
doc.css('h3.r a.l').each do |link|
puts link.content
end
See also:
-
XML::Searchable#css for more information about
CSS
searching -
XML::Searchable#xpath for more information about XPath searching
Constant Summary
-
JAR_DEPENDENCIES =
generated by the
:vendor_jars
rake task{ "isorelax:isorelax" => "20030108", "net.sf.saxon:Saxon-HE" => "9.6.0-4", "net.sourceforge.htmlunit:neko-htmlunit" => "2.63.0", "nu.validator:jing" => "20200702VNU", "org.nokogiri:nekodtd" => "0.1.11.noko2", "xalan:serializer" => "2.7.3", "xalan:xalan" => "2.7.3", "xerces:xercesImpl" => "2.12.2", "xml-apis:xml-apis" => "1.4.01", }.freeze
-
NEKO_VERSION =
# File 'lib/nokogiri/jruby/nokogiri_jars.rb', line 42JAR_DEPENDENCIES["net.sourceforge.htmlunit:neko-htmlunit"]
-
VERSION =
The version of
Nokogiri
you are using"1.19.0.dev"
-
VERSION_INFO =
Detailed version info about
Nokogiri
and the installed extension dependencies.VersionInfo.instance.to_hash
-
XERCES_VERSION =
# File 'lib/nokogiri/jruby/nokogiri_jars.rb', line 41JAR_DEPENDENCIES["xerces:xercesImpl"]
Class Attribute Summary
- .jruby? ⇒ Boolean readonly Internal use only
- .uses_gumbo? ⇒ Boolean readonly Internal use only
Class Method Summary
-
HTML(input, url = nil, encoding = nil, options = XML::ParseOptions::DEFAULT_HTML, &block) → Nokogiri::HTML4::Document)
Parse HTML.
-
HTML4
Convenience method for HTML4::Document.parse
-
HTML5
Convenience method for HTML5::Document.parse
-
.make(input = nil, opts = {}, &blk)
Create a new
::Nokogiri::XML::DocumentFragment
- .parse(string, url = nil, encoding = nil, options = nil)
-
Slop(*args, &block)
Parse a document and add the .Slop decorator.
-
XML
Convenience method for XML::Document.parse
-
XSLT
Convenience method for XSLT.parse
- .install_default_aliases Internal use only
- .libxml2_patches Internal use only
- .uses_libxml?(requirement = nil) ⇒ Boolean Internal use only
Class Attribute Details
.jruby? ⇒ Boolean
(readonly)
# File 'lib/nokogiri/version/info.rb', line 206
def self.jruby? VersionInfo.instance.jruby? end
.uses_gumbo? ⇒ Boolean
(readonly)
# File 'lib/nokogiri/version/info.rb', line 201
def self.uses_gumbo? uses_libxml? # TODO: replace with Gumbo functionality end
Class Method Details
HTML(input, url = nil, encoding = nil, options = XML::ParseOptions::DEFAULT_HTML, &block) → Nokogiri::HTML4::Document)
Parse HTML. Convenience method for HTML4::Document.parse
# File 'lib/nokogiri/html.rb', line 10
RDoc directive :singleton-method: HTML
HTML4
Convenience method for HTML4::Document.parse
HTML5
Convenience method for HTML5::Document.parse
.install_default_aliases
# File 'lib/nokogiri.rb', line 96
def install_default_aliases warn("Nokogiri.install_default_aliases is deprecated. Please call Nokogiri::EncodingHandler.install_default_aliases instead. This will become an error in Nokogiri v1.17.0.", uplevel: 1, category: :deprecated) # deprecated in v1.14.0, remove in v1.17.0 Nokogiri::EncodingHandler.install_default_aliases end
.libxml2_patches
# File 'lib/nokogiri/version/info.rb', line 211
def self.libxml2_patches if VersionInfo.instance.libxml2_using_packaged? Nokogiri::VERSION_INFO["libxml"]["patches"] else [] end end
.make(input = nil, opts = {}, &blk)
Create a new ::Nokogiri::XML::DocumentFragment
.parse(string, url = nil, encoding = nil, options = nil)
Parse an ::Nokogiri::HTML
or ::Nokogiri::XML
document. string
contains the document.
# File 'lib/nokogiri.rb', line 42
def parse(string, url = nil, encoding = nil, = nil) if string.respond_to?(:read) || /^\s*<(?:!DOCTYPE\s+)?html[\s>]/i.match?(string[0, 512]) # Expect an HTML indicator to appear within the first 512 # characters of a document. (<?xml ?> + <?xml-stylesheet ?> # shouldn't be that long) Nokogiri.HTML4( string, url, encoding, || XML::ParseOptions::DEFAULT_HTML, ) else Nokogiri.XML( string, url, encoding, || XML::ParseOptions::DEFAULT_XML, ) end.tap do |doc| yield doc if block_given? end end
Slop(*args, &block)
Parse a document and add the Slop
decorator. The Slop decorator implements method_missing such that methods may be used instead of ::Nokogiri::CSS
or XPath. For example:
doc = Nokogiri::Slop(<<-eohtml)
<html>
<body>
<p>first</p>
<p>second</p>
</body>
</html>
eohtml
assert_equal('second', doc.html.body.p[1].text)
# File 'lib/nokogiri.rb', line 91
def Slop(*args, &block) Nokogiri(*args, &block).slop! end
.uses_libxml?(requirement = nil) ⇒ Boolean
# File 'lib/nokogiri/version/info.rb', line 193
def self.uses_libxml?(requirement = nil) return false unless VersionInfo.instance.libxml2? return true unless requirement Gem::Requirement.new(requirement).satisfied_by?(VersionInfo.instance.loaded_libxml_version) end
XML
Convenience method for XML::Document.parse
XSLT
Convenience method for XSLT.parse