-
Notifications
You must be signed in to change notification settings - Fork 2
fix: in Nokogiri >= v1.17.0, use SAX::Document#reference #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This works around the issues reported at: - sparklemotion/nokogiri#1926 - sparklemotion/nokogiri#3147 Closes searls#10.
**What problem is this PR intended to solve?** #1926 described an issue wherein the SAX parser was not correctly resolving and replacing internal entities, and was instead reporting an error for each entity reference. This PR includes a fix for that problem. I've removed the unnecessary "SAX tuple" from the SAX implementation, replacing it with the `_private` struct member that libxml2 makes available. Then I set up the parser context structs so that we can use libxml2's standard SAX callbacks where they're useful (which is how I addressed the above issue). This PR also introduces a new feature, a SAX handler callback `Document#reference` which allows callers to get entity-specific name and replacement text information (rather than relying on the `Document#characters` callback). This can be used to solve the original issue in #1926 with this code: searls/eiwa#11 The behavior of the SAX parser with respect to entities is complex enough that I wrote up a short doc in the `XML::SAX::Document` docstring with a table and explanation. I've also added warnings to remind users that `#replace_entities` is not safe to set when parsing untrusted documents. In the Java implementation, I've fixed the `#replace_entities` option in the SAX parser context and set it to the proper default (`false`), fixing #614. I've also corrected the value of the URI argument to `Document#start_element_namespace` which was a blank string when it should have been `nil`. I've added quite a bit of testing around the SAX parser's handling of entities. I added and clarified quite a bit of documentation around SAX parsing generally. Exception messages have been clarified in a couple of places, and made consistent between the C and Java implementations. This should address questions asked in issues #1500 and #1284. Finally, I cleaned up some of the C code that implements SAX parsing, naming functions more explicitly (and moving towards some kind of standard naming convention). Closes #1926. Closes #614. **Have you included adequate test coverage?** Yes! **Does this change affect the behavior of either the C or the Java implementations?** Yes, but the implementations are much more consistent with each other now.
Thanks @flavorjones! Because my app that consumes this is not in active development (and I'm in the "scared to touch it" phase of its lifecycle), I'm going to hold off until this version of the gem releases so I can be sure that I don't break anything Great job! |
Totally makes sense. I'm hoping to cut a release later this month. |
@searls I released Nokogiri v1.17.0 today, so I circled back and pushed an additional commit that:
It's green for me locally! |
Pulled this down and tried running bundle and got this cool message:
… Despite |
Thanks for chasing this down Mike! I just merged and released as 0.2.0! |
The
#reference
callback will be available in Nokogiri v1.17.0. See sparklemotion/nokogiri#3265. That version is not released yet, but this code is backwards-compatible.This works around the issues reported at:
Closes #10.