8000 Can DFA/NFA construction be offloaded to the regex_automata crate? · Issue #947 · lalrpop/lalrpop · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Can DFA/NFA construction be offloaded to the regex_automata crate? #947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Pat-Lafon opened this issue Sep 11, 2024 · 0 comments
Open

Comments

@Pat-Lafon
Copy link
Contributor

Thanks for the report!

I spent some time profiling this. The hot spot is this function, which looks like it scales very poorly on an increase in the number of NFA states, and based on a cursory reading of this bit, it looks like a star repeat pattern is getting represented as one state, whereas a counted group of N repetitions is N states. So the repetition group hits the poor scaling in transitive_closure really badly and loops a lot.

Googling around, the theoretical algorithmic complexity to calculate a transitive closure is O(n^3), so I doubt we'll get much win there. The NFA states we store don't have internal state to handle a "number of repetitions" counter, so that's why we have to populate the graph with every state.

Based on all that, I don't see any options for a quick fix here. This seems like a design consequence of the high level algorithm, rather than something that can be locally fixed. I do wonder why we're doing all this building NFA states internally instead of handing it off to regex_automata. It's possible that some of this can be offloaded and the specialized library plausible could have better handling here. But at least personally, I'd need quite a bit more high level diving in to the code in order to take that on.

So, looping back to the high level question, I don't think you're doing anything "wrong" persay, but this seems to just be a fundamental issue with how lalrpop is handling regex tokens. My instinct is to say to use the ".*" approach and then check the length in the action code as a workaround.

Originally posted by @dburgener in #941 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant
0