Can DFA/NFA construction be offloaded to the regex_automata crate? · Issue #947 · lalrpop/lalrpop · GitHub
More Web Proxy on the site http://driver.im/
You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I spent some time profiling this. The hot spot is this function, which looks like it scales very poorly on an increase in the number of NFA states, and based on a cursory reading of this bit, it looks like a star repeat pattern is getting represented as one state, whereas a counted group of N repetitions is N states. So the repetition group hits the poor scaling in transitive_closure really badly and loops a lot.
Googling around, the theoretical algorithmic complexity to calculate a transitive closure is O(n^3), so I doubt we'll get much win there. The NFA states we store don't have internal state to handle a "number of repetitions" counter, so that's why we have to populate the graph with every state.
Based on all that, I don't see any options for a quick fix here. This seems like a design consequence of the high level algorithm, rather than something that can be locally fixed. I do wonder why we're doing all this building NFA states internally instead of handing it off to regex_automata. It's possible that some of this can be offloaded and the specialized library plausible could have better handling here. But at least personally, I'd need quite a bit more high level diving in to the code in order to take that on.
So, looping back to the high level question, I don't think you're doing anything "wrong" persay, but this seems to just be a fundamental issue with how lalrpop is handling regex tokens. My instinct is to say to use the ".*" approach and then check the length in the action code as a workaround.
Thanks for the report!
I spent some time profiling this. The hot spot is this function, which looks like it scales very poorly on an increase in the number of NFA states, and based on a cursory reading of this bit, it looks like a star repeat pattern is getting represented as one state, whereas a counted group of N repetitions is N states. So the repetition group hits the poor scaling in transitive_closure really badly and loops a lot.
Googling around, the theoretical algorithmic complexity to calculate a transitive closure is O(n^3), so I doubt we'll get much win there. The NFA states we store don't have internal state to handle a "number of repetitions" counter, so that's why we have to populate the graph with every state.
Based on all that, I don't see any options for a quick fix here. This seems like a design consequence of the high level algorithm, rather than something that can be locally fixed. I do wonder why we're doing all this building NFA states internally instead of handing it off to regex_automata. It's possible that some of this can be offloaded and the specialized library plausible could have better handling here. But at least personally, I'd need quite a bit more high level diving in to the code in order to take that on.
So, looping back to the high level question, I don't think you're doing anything "wrong" persay, but this seems to just be a fundamental issue with how lalrpop is handling regex tokens. My instinct is to say to use the ".*" approach and then check the length in the action code as a workaround.
Originally posted by @dburgener in #941 (comment)
The text was updated successfully, but these errors were encountered: