Roadmap for API Changes
overhaul serialize/pretty printing API
overhaul and optimize the SAX parsing
- see fairy wing throwdown - SAX parsing is wicked slow.
Node should not be Enumerable; and should have a better attributes API
#679 Mixing in Enumerable has some unintended consequences; plus we want to improve the attributes API
Some ideas for a better attributes API?
improve CSS query parsing
#528 support
:not()
with a nontrivial argument, like:not(div p.c)
#451 chained :not pseudoselectors
better jQuery selector and CSS pseudo-selector support:
#394 nth-of-type is wrong, and possibly other selectors as well
#309 incorrect query being executed
#350 :has is wrong?
DocumentFragment
- there are a few tickets about searches not working properly if you use or do not use the context node as part of the search.
Better Syntax for custom XPath function handler
Better Syntax around Node#xpath and NodeSet#xpath
- look at those methods, and use of Node#extract_params in Node#
css,search
- we should standardize on a hash of options for these and other calls
- what should NodeSet#xpath return?
Encoding
We have a lot of issues open around encoding. How bad are things? Somebody who knows encoding well should head this up.
- Extract EncodingReader as a real object that can be injected https://groups.google.com/forum/#!msg/nokogiri-talk/arJeAtMqvkg/tGihB-iBRSAJ
Reader
It's fundamentally broken, in that we can't stop people from crashing their application if they want to use object reference unsafely.
Class methods that require Document
There are a few methods, like Nokogiri::XML::Comment.new that require a Document object.
We should probably make Document instance methods to wrap this, since it's a non-obvious expectation and thus fails as a convention.
So, instead, let's make alternative methods like
Nokogiri::XML::Document#new_comment
, and recommend those as the
proper convention.
collect_namespaces
is just broken
collect_namespaces
is returning a hash, which means it can't return
namespaces with the same prefix. See this issue for background:
Do we care? This seems like a useless method, but then again I hate XML, so what do I know?
Overhaul ParseOptions
Currently we mirror libxml2's parse options, and then retrofit those options on top of Xerces-J for JRuby.
- I'd like to identify which options work across both parsers,
- And overhaul the parse methods so that these options are easier to use.
By "easier to use" I mean:
- it's unwieldy to create a block to set/unset parse options
- it's unwieldy to create a constant like
MY_PARSE_OPTIONS = Nokogiri::XML::ParseOptions::STRICT | Nokogiri::XML::ParseOptions::RECOVER ...
- some options are named dangerously poorly, like
NOENT
which does the opposite of what it says - semantically some options should be set/unset together, specifically "this is a trusted document" or "this is an untrusted document" should flip the senses of
NONET
andNOENT
andDTDLOAD
together. - we need the ability to invent new parse options, like the one suggested in #1582 that would allow local entities but not external entities.