-
Notifications
You must be signed in to change notification settings - Fork 23
Consider using RE2 (aka golang regexp
) for wildcard matches
#80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
// An RE2::Set represents a collection of regexps that can
// be searched for simultaneously.
// Adds pattern to the set using the options passed to the constructor.
// Returns the index that will identify the regexp in the output of Match(),
// or -1 if the regexp cannot be parsed.
// Indices are assigned in sequential order starting from 0.
// Errors do not increment the index; if error is not NULL, *error will hold
// the error message from the parser.
int Add(const StringPiece& pattern, std::string* error); Might be relevant? Source |
First of all, I believe it's not just a Go thing, more or less every regex library has finite automata inside, isn't that the only sane approach? RE2::Set at one level sounds much like what Quamina implements. The interesting thing about Quamina is that you can add as many patterns as you want and it doesn't affect the performance very much. Another useful thing about Quamina is that each pattern you add has an identifier and the Match operation tells you which of the patterns you added matched. Anyhow, I'll have a closer look at the Go regexp library. I wonder if it gives programmatic access to the DFA produced by compiling the regex? If so, that would be incredibly useful. If not, it still might be worthwhile forking regexp in order to steal that compiler. Personally, over in the IETF JSONPATH Working Group we're working on IRegexp, https://datatracker.ietf.org/doc/draft-ietf-jsonpath-iregexp/ and that's the subset I'd like to try to support in Quamina. |
If you read the linked paper, you'll see that both I mostly mentioned re2 (open source as well) and Go's |
Right, and of course having backtracking doesn't mean you're not using DFAs. Sometime I will share the funny/horrible story about how the Java regex implementation blew up JRuby on Rails. Anyhow, as noted, I'll take a close look as to whether we can get at Go regexp's automaton-building code. Of course, that software is aimed at answering the question “Does this regex match this string and where does it start?" while Quamina aims at the question “Which of the many patterns in this instance match this string?” but there still may be value to share. |
I think that from the job-to-be-done point of view, the applicable issue is #66. I looked at RE2 and I don't think it exposes the automaton-level APIs that Quamina would need, but I do think some of the source code could be copied and re-used for this. |
You might be interested to know that the Go regexp library uses finite automata already to implement the
regexp
package. From that package:It should be trivial(1) to replace the current
shellstyle
matcher with a translation to theregexp
module, and to check for performance differences.(1) exercise left to the reader
The one place where
shellstyle
might be faster thanregexp
is if you require start and end matching and allow only one glob in the middle. At that point,shellstyle
becomes a shortcut forstartswith("stuff before *") AND endswith("stuff after *") AND len(string) > len(expression)
.The text was updated successfully, but these errors were encountered: