8000 fix: in Nokogiri >= v1.17.0, use SAX::Document#reference by flavorjones · Pull Request #11 · searls/eiwa · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fix: in Nokogiri >= v1.17.0, use SAX::Document#reference #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 10, 2024

Conversation

flavorjones
Copy link
Contributor
@flavorjones flavorjones commented Jul 1, 2024

The #reference callback will be available in Nokogiri v1.17.0. See sparklemotion/nokogiri#3265. That version is not released yet, but this code is backwards-compatible.

This works around the issues reported at:

Closes #10.

flavorjones added a commit to sparklemotion/nokogiri that referenced this pull request Jul 2, 2024
**What problem is this PR intended to solve?**

#1926 described an issue wherein the SAX parser was not correctly
resolving and replacing internal entities, and was instead reporting an
error for each entity reference. This PR includes a fix for that
problem.

I've removed the unnecessary "SAX tuple" from the SAX implementation,
replacing it with the `_private` struct member that libxml2 makes
available. Then I set up the parser context structs so that we can use
libxml2's standard SAX callbacks where they're useful (which is how I
addressed the above issue).

This PR also introduces a new feature, a SAX handler callback
`Document#reference` which allows callers to get entity-specific name
and replacement text information (rather than relying on the
`Document#characters` callback). This can be used to solve the original
issue in #1926 with this code: searls/eiwa#11

The behavior of the SAX parser with respect to entities is complex
enough that I wrote up a short doc in the `XML::SAX::Document` docstring
with a table and explanation. I've also added warnings to remind users
that `#replace_entities` is not safe to set when parsing untrusted
documents.

In the Java implementation, I've fixed the `#replace_entities` option in
the SAX parser context and set it to the proper default (`false`),
fixing #614. I've also corrected the value of the URI argument to
`Document#start_element_namespace` which was a blank string when it
should have been `nil`.

I've added quite a bit of testing around the SAX parser's handling of
entities.

I added and clarified quite a bit of documentation around SAX parsing
generally. Exception messages have been clarified in a couple of places,
and made consistent between the C and Java implementations. This should
address questions asked in issues #1500 and #1284.

Finally, I cleaned up some of the C code that implements SAX parsing,
naming functions more explicitly (and moving towards some kind of
standard naming convention).

Closes #1926.
Closes #614.


**Have you included adequate test coverage?**

Yes!


**Does this change affect the behavior of either the C or the Java
implementations?**

Yes, but the implementations are much more consistent with each other
now.
@searls
Copy link
Owner
searls commented Jul 2, 2024

Thanks @flavorjones! Because my app that consumes this is not in active development (and I'm in the "scared to touch it" phase of its lifecycle), I'm going to hold off until this version of the gem releases so I can be sure that I don't break anything

Great job!

@flavorjones
Copy link
Contributor Author

Totally makes sense. I'm hoping to cut a release later this month.

@flavorjones
Copy link
Contributor Author

@searls I released Nokogiri v1.17.0 today, so I circled back and pushed an additional commit that:

  1. pins to Nokogiri v1.17.0 which implements the reference callback
  2. removes the entity reference handling code in the error callback

It's green for me locally!

@searls
Copy link
Owner
searls commented Dec 10, 2024

Pulled this down and tried running bundle and got this cool message:

○ (flavorjones-flavorjones-sax-entity-fix) ~/code/searls/eiwa $ bundle
Fetching gem metadata from https://rubygems.org/.........
Resolving dependencies...
Resolving dependencies...
Could not find gem 'nokogiri (= 1.17.0)' with platform 'x86_64-linux' in rubygems repository https://rubygems.org/ or installed locally.

The source contains the following gems matching 'nokogiri (= 1.17.0)':
  * nokogiri-1.17.0-aarch64-linux
  * nokogiri-1.17.0-arm-linux
  * nokogiri-1.17.0-arm64-darwin
  * nokogiri-1.17.0-java
  * nokogiri-1.17.0
  * nokogiri-1.17.0-x64-mingw-ucrt
  * nokogiri-1.17.0-x64-mingw32
  * nokogiri-1.17.0-x86-linux
  * nokogiri-1.17.0-x86-mingw32
  * nokogiri-1.17.0-x86_64-darwin
  * nokogiri-1.17.0-x86_64-linux

… Despite nokogiri-1.17.0-x86_64-linux being right there.

@searls searls merged commit 2233574 into searls:master Dec 10, 2024
@searls
Copy link
Owner
searls commented Dec 10, 2024

Thanks for chasing this down Mike! I just merged and released as 0.2.0!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Misc. codes are no longer being read correctly
2 participants
0