-
Notifications
You must be signed in to change notification settings - Fork 163
Enhance parser and name-lookup with better quoting #420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Aargh, Well, that failure's for a completely different reason: I forgot you can't do this in Python 3.8: with (
open(some_file) as in1,
open(some_file) as in2
): (It can be done, but it can't be wrapped in parens and made multi-line until Python 3.10.) |
dot_parser has always "cheated" our quoting logic by passing in values already quoted. That means, for all of the graphs we run through it in testing, it's not really testing our quoting logic at all. Remove the quotes from quoted values during parsing, so that our quoting logic is tested, and so that users who run graphs through `dot -Tdot` and parse them back in won't get data with all of the identifiers changed.
If TEST_ERROR_DIR is set when running the tests, and a failure is encountered in one of the file-driven tests, write out both the original file and the `.to_string()` for the parsed version.
1382663
to
1037179
Compare
Oh! Turns out, the This still shouldn't be merged until we're sure we're ready to start development on Pydot 4.0.0, but it's no longer a WIP as it's now passing all tests, same as |
Oh, right, but I still can't do this in Python < 3.10. with (
open(some_file) as in1,
open(some_file) as in2
): |
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
This pull request has conflicts, please resolve those so that the changes can be evaluated. |
All conflicts have been resolved, thanks! |
As long as the strings are quoted, both graphviz and our parser can handle embedded newlines, so there's no reason to escape them. Makes round-trips through graphviz commands less disruptive.
d3fd169
to
2247170
Compare
This pull request has conflicts, please resolve those so that the changes can be evaluated. |
955c43e
to
a1f522f
Compare
I think this might be done. Even if there's more I could do, it's big enough already that I'd rather save anything further for a separate PR. Oh, I need to add tests for the whole make-sure-the-parser-doesn't-unquote-keyword-names thing I spotted, and fixed, above. Which I'd rather put in |
This pull request has conflicts, please resolve those so that the changes can be evaluated. |
All conflicts have been resolved, thanks! |
This pull request has conflicts, please resolve those so that the changes can be evaluated. |
All conflicts have been resolved, thanks! |
This way they can be shared between core and dot_parser.
I think this branch is now good, but I'm not sure how reviewable it is anymore. I may have to re-build it on top of the current |
This pull request has conflicts, please resolve those so that the changes can be evaluated. |
Hi! I've been going through the blame for the fix to #396, and I've ended up on this draft PR. I'm looking at this stuff because in networkx/networkx#7648, someone found that long node names can break stuff. My understanding is that the changes in this PR should enable us to fix that issue as well (and that until they land, there's not much we can do), but I'm nowhere near qualified enough to say for sure. If you have the time, could you take a quick look at that? |
@Peiffap I'm surprised to see this issue hanging around for a year, it was never reported here in pydot. Thanks for reporting. I'm not sure how up-to-date this PR is. It could be outdated. And I'm assuming you've tried with the newest pydot (v4), right? |
Yes, I'm on v4.0.0. I'm not sure how up to date it is either - but at least GitHub thinks it can be cleanly merged, which is a nice start. |
Yes, you can create an issue, or I'll create it, whatever. Thanks! |
This is Pydot 4.0 work for sure, since its changes are pretty sweeping
Stripping quotes during parsing
In #413 I mentioned that it would enhance round-tripping of our graphs through
dot
and back again if the parser unquoted input strings where possible. This PR starts the work of doing that.Because not all strings can be unquoted, instead of enabling
unquote_results
inpyparsing.QuotedString()
, a new parse actionpossibly_unquote
is attached to thedouble_quoted_string
parser element that handles making that decision. It makes sure that any quoted strings stay quoted that need to: double-quoted strings containing what would otherwise parse as an HTML-like string, or as an ID with port values, etc. If it will confuse the pydot code when sent in unquoted, it's left quoted. Otherwise, it's unquoted, which matchesdot
's own behavior in parsing.In addition, the lookup functions
get_node()
,get_subgraph()
, andget_edge()
are enhanced to also try the quoted versions of their ID arguments, if passed unquoted strings.1 So, we're better protected both coming and going.The
make_quoted
utility function will no longer translate newlines and carriage returns into\n
and\r
— it's unnecessary, asdot
quoted strings can contain the raw control characters without issue, and our parser can consume them as well. So, leaving them be will further enhance our ability to round-trip graphs through graphviz commands with better fidelity.Fixes #413
Notes
For
get_edge()
in particular, that became extremely complex, because each endpoint has to be separately checked for a possible quoted form, and tested with or without a quoted form of the other endpoint.So, for example,
get_edge('abc', 'def')
will try:('abc', 'def')
('"abc"', '"def"')
('abc', 'def')
('"abc"', '"def"')
('def', 'abc')
('"def"', '"abc"')
But when one side is already quoted, the quoted form of the other side still has to be tested as well. So,
get_edge('<<i>html</i>>', 'def')
will try:('<<i>html</i>>', 'def')
('<<i>html</i>>', '"def"')
('<<i>html</i>>', 'def')
('<<i>html</i>>', '"def"')
('def', '<<i>html</i>>')
('"def"', '<<i>html</i>>')
That code is ugly, but I dropped in a long-ass explanatory comment which will hopefully help.
Improved numeric parsing in
dot_parser
-
sign.ID
values, and are legal unquoted in all places where graphviz allows them._
. This was already true in the quoting logic, but now the parser does the right thing too.The upshot of all those changes is that, in
main
:On this branch:
Much more in line with graphviz.
Due to these changes, this PR now...
Fixes: #436
Move constants to
pydot.constants
A new file
constants.py
is added topydot
, containing all of theCONSTANT_LIST_VARIABLES
previously defined inpydot.core
. (The actual reason for this move is, I also need theDOT_KEYWORDS
list indot_parser
now.)Because all of the constants in
pydot.constants
are currently imported intopydot.core
, the__init__.py
file'sfrom pydot.core import *
import will still pick them up, which means that things likepydot.NODE_ATTRIBUTES
andpydot.OUTPUT_FORMATS
will still be defined even after this change.However, I'm tempted to change that (e.g. by having
pydot.core
usefrom pydot import constants
, and refer to the constants asconstants.CONSTANT_NAME
), in order to get the constants lists out of the rootpydot
namespace. They were never really intended to be referenced from thepydot.
level.