Description
Thanks for the report!
I spent some time profiling this. The hot spot is this function, which looks like it scales very poorly on an increase in the number of NFA states, and based on a cursory reading of this bit, it looks like a star repeat pattern is getting represented as one state, whereas a counted group of N repetitions is N states. So the repetition group hits the poor scaling in transitive_closure really badly and loops a lot.
Googling around, the theoretical algorithmic complexity to calculate a transitive closure is O(n^3), so I doubt we'll get much win there. The NFA states we store don't have internal state to handle a "number of repetitions" counter, so that's why we have to populate the graph with every state.
Based on all that, I don't see any options for a quick fix here. This seems like a design consequence of the high level algorithm, rather than something that can be locally fixed. I do wonder why we're doing all this building NFA states internally instead of handing it off to regex_automata. It's possible that some of this can be offloaded and the specialized library plausible could have better handling here. But at least personally, I'd need quite a bit more high level diving in to the code in order to take that on.
So, looping back to the high level question, I don't think you're doing anything "wrong" persay, but this seems to just be a fundamental issue with how lalrpop is handling regex tokens. My instinct is to say to use the ".*" approach and then check the length in the action code as a workaround.
Originally posted by @dburgener in #941 (comment)