Can DFA/NFA construction be offloaded to the regex_automata crate?

@dburgener

Thanks for the report!

I spent some time profiling this. The hot spot is this function, which looks like it scales very poorly on an increase in the number of NFA states, and based on a cursory reading of this bit, it looks like a star repeat pattern is getting represented as one state, whereas a counted group of N repetitions is N states. So the repetition group hits the poor scaling in transitive_closure really badly and loops a lot.

Googling around, the theoretical algorithmic complexity to calculate a transitive closure is O(n^3), so I doubt we'll get much win there. The NFA states we store don't have internal state to handle a "number of repetitions" counter, so that's why we have to populate the graph with every state.

Based on all that, I don't see any options for a quick fix here. This seems like a design consequence of the high level algorithm, rather than something that can be locally fixed. I do wonder why we're doing all this building NFA states internally instead of handing it off to regex_automata. It's possible that some of this can be offloaded and the specialized library plausible could have better handling here. But at least personally, I'd need quite a bit more high level diving in to the code in order to take that on.

So, looping back to the high level question, I don't think you're doing anything "wrong" persay, but this seems to just be a fundamental issue with how lalrpop is handling regex tokens. My instinct is to say to use the ".*" approach and then check the length in the action code as a workaround.

Originally posted by @dburgener in #941 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions