Class: LibXML::XML::HTMLParser
Relationships & Source Files | |
Namespace Children | |
Modules:
| |
Classes:
| |
Inherits: | Object |
Defined in: | ext/libxml/ruby_xml_html_parser.c, ext/libxml/ruby_xml_html_parser.c, lib/libxml/html_parser.rb |
Overview
The HTML parser implements an HTML 4.0 non-verifying parser with an API compatible with the Parser
. In contrast with the Parser
, it can parse “real world” HTML, even if it severely broken from a specification point of view.
The HTML parser creates an in-memory document object that consist of any number of Node
instances. This is simple and powerful model, but has the major limitation that the size of the document that can be processed is limited by the amount of memory available.
Using the html parser is simple:
parser = XML::HTMLParser.file('my_file')
doc = parser.parse
You can also parse documents (see XML::HTMLParser.document
), strings (see .string) and io objects (see .io).
Class Method Summary
-
XML::HTMLParser.file(path) ⇒ HTMLParser
Creates a new parser by parsing the specified file or uri.
-
XML::HTMLParser.io(io) ⇒ HTMLParser
Creates a new reader by parsing the specified io object.
-
XML::HTMLParser.initialize ⇒ parser
constructor
Initializes a new parser instance with no pre-determined source.
-
XML::HTMLParser.string(string)
Creates a new parser by parsing the specified string.
Instance Attribute Summary
-
#input
readonly
Atributes.
- #file=(value) writeonly Internal use only
- #io=(value) writeonly Internal use only
- #string=(value) writeonly Internal use only
Instance Method Summary
-
#parse ⇒ XML::Document
Parse the input
::LibXML::XML
and create anDocument
with it’s content.
Constructor Details
XML::HTMLParser.initialize ⇒ parser
Initializes a new parser instance with no pre-determined source.
# File 'ext/libxml/ruby_xml_html_parser.c', line 39
static VALUE rxml_html_parser_initialize(int argc, VALUE *argv, VALUE self) { VALUE context = Qnil; rb_scan_args(argc, argv, "01", &context); if (context == Qnil) { rb_raise(rb_eArgError, "An instance of a XML::Parser::Context must be passed to XML::HTMLParser.new"); } rb_ivar_set(self, CONTEXT_ATTR, context); return self; }
Class Method Details
XML::HTMLParser.file(path) ⇒ HTMLParser
XML::HTMLParser.file(path, encoding: XML::Encoding::UTF_8)
.options ⇒ HTMLParser
HTMLParser
XML::HTMLParser.file(path, encoding: XML::Encoding::UTF_8)
.options ⇒ HTMLParser
Creates a new parser by parsing the specified file or uri.
Parameters:
path - Path to file to parse
encoding - The document encoding, defaults to nil. Valid values
are the encoding constants defined on XML::Encoding.
- Parser . Valid values are the constants defined on
XML::HTMLParser::Options. Mutliple can be combined
by using Bitwise OR (|).
XML::HTMLParser.io(io) ⇒ HTMLParser
XML::HTMLParser.io(io, encoding: XML::Encoding::UTF_8)
.options
.base_uri ⇒ HTMLParser
HTMLParser
XML::HTMLParser.io(io, encoding: XML::Encoding::UTF_8)
.options
.base_uri ⇒ HTMLParser
Creates a new reader by parsing the specified io object.
Parameters:
io - io object that contains the xml to parser
base_uri - The base url for the parsed document.
encoding - The document encoding, defaults to nil. Valid values
are the encoding constants defined on XML::Encoding.
- Parser . Valid values are the constants defined on
XML::HTMLParser::Options. Mutliple can be combined
by using Bitwise OR (|).
XML::HTMLParser.string(string)
XML::HTMLParser.string(string, encoding: XML::Encoding::UTF_8)
.options
.base_uri ⇒ HTMLParser
HTMLParser
Creates a new parser by parsing the specified string.
Parameters:
string - String to parse
base_uri - The base url for the parsed document.
encoding - The document encoding, defaults to nil. Valid values
are the encoding constants defined on XML::Encoding.
- Parser . Valid values are the constants defined on
XML::HTMLParser::Options. Mutliple can be combined
by using Bitwise OR (|).
Instance Attribute Details
#file=(value) (writeonly)
#input (readonly)
Atributes
#io=(value) (writeonly)
#string=(value) (writeonly)
Instance Method Details
#parse ⇒ XML::Document
Parse the input ::LibXML::XML
and create an Document
with it’s content. If an error occurs, XML::Parser::ParseError
is thrown.
# File 'ext/libxml/ruby_xml_html_parser.c', line 62
static VALUE rxml_html_parser_parse(VALUE self) { xmlParserCtxtPtr ctxt; VALUE context = rb_ivar_get(self, CONTEXT_ATTR); Data_Get_Struct(context, xmlParserCtxt, ctxt); if (htmlParseDocument(ctxt) == -1 && ! ctxt->recovery) { rxml_raise(&ctxt->lastError); } rb_funcall(context, rb_intern("close"), 0); return rxml_document_wrap(ctxt->myDoc); }