rzozowski (ruh-zov-ski) is a Rust crate for reasoning about regular expressions in terms of Brzozowski derivatives.
Let's say we have a regular expression
For example, if we have the regular expression
For a more complex example, take the regular expression
We usually think of a regular expression as a finite automaton and the act of matching a string as the act of transitioning between the automaton's states. Instead, the Brzozowski derivative allows us to skip the finite automaton altogether and determine whether a string matches a regular expression by testing the derivative's nullability (i.e., whether it can match the empty string).
The algorithm is very intuitive (below in pseudocode):
R is a regular expression
s is a string
for char c in s:
if R cannot accept c:
s does not match R
else:
R = D_c(R)
if R is nullable:
s matches R
else:
s does not match R
I wrote this because I needed to calculate Brzozowski derivatives and couldn't find any satisfactory crates. Once I had implemented derivatives and parsing, it was only 7 more lines to implement matching. Thus, I had a regex crate.
This crate does not aim to compete with existing regex crates. For most scenarios, you should probably use a more established crate like regex or fancy-regex.
Install with cargo add rzozowski
or add to your Cargo.toml
:
[dependencies]
rzozowski = "0.2.0"
Usage is very simple. rzozowski allows you to:
- Parse a
&str
into aRegex
- Convert a
Regex
into aString
- Calculate the derivatives of a
Regex
- Simplify a
Regex
- Check if a
&str
matches aRegex
Here's a simple example:
use rzozowski::Regex;
fn main() {
let r = Regex::new("ca+b").unwrap();
let s = "caab";
assert!(r.matches(s));
let derivative = r.derivative('c');
assert_eq!(derivative, Regex::new("a+b").unwrap());
}
rzozowski supports the following regex features:
- Literal characters (e.g.,
a
) - Concatenation (e.g.,
ab
) - Alternation (e.g.,
a|b
) - Kleene star (e.g.,
a*
) - Plus (e.g.,
a+
) - Optional (e.g.,
a?
) - Character classes (e.g.,
[a-z123]
,\d
,\w
,\s
) - Counts (e.g.,
a{3}
,a{3,}
, ora{3,5}
) - Parentheses (e.g.,
(ab)+
)
Note that rzozowski currently does not support capture groups, backreferences, or lookaheads. If you need these features, you should use a more established regex crate or submit a pull request to add them here :)
rzozowski is dramatically slower than the standard regex
crate at matching, and faster at parsing.
Benchmarking code can be found in the benches
directory. I intend to perform more thorough benchmarking soon.
Category | rzozowski | regex | Ratio (rzozowski/regex) |
---|---|---|---|
Simple | 1.31 μs | 5.21 μs | 0.25 |
Intermediate | 1.94 μs | 98.97 μs | 0.02 |
Complex | 3.76 μs | 48.85 μs | 0.08 |
Category | rzozowski | regex | Ratio (rzozowski/regex) |
---|---|---|---|
Simple | 1.91 μs | 11.42 ns | 167.63 |
Intermediate | 106.42 μs | 32.57 ns | 3267.25 |
Complex | 114.95 μs | 25.57 ns | 4496.19 |
Category | rzozowski | regex | Ratio (rzozowski/regex) |
---|---|---|---|
Simple | 1.56 μs | 7.89 ns | 198.34 |
Intermediate | 75.87 μs | 13.55 ns | 5598.72 |
Complex | 106.91 μs | 21.67 ns | 4932.49 |
Here are some resources that I found helpful in understanding Brzozowski derivatives:
- Regular Expression Derivatives in Python by Michael Paddon
- Regular-expression derivatives reexamined by Owens et al.
Contributions are welcome! If you have any suggestions, bug reports, or feature requests, please open an issue or submit a pull request. Alternatively, you can email me at feyles@icloud.com if you'd like to chat.
We use monk for git hooks to ensure code quality. You must run cargo fmt
and cargo clippy -- -D warnings
before committing, and your code may not break any tests before being pushed. You can read the hooks in the monk.yaml
file.