A Python implementation of Thompson's construction algorithm to convert regular expressions to deterministic finite automata (DFA).
This project implements a complete pipeline for converting regular expressions to deterministic finite automata:
- Regular expression parsing
- Conversion to postfix notation (using Shunting Yard algorithm)
- NFA construction (using Thompson's algorithm)
- DFA conversion (using subset construction)
- DFA simulation
- Supports basic regex operations: concatenation, alternation (
|
), Kleene star (*
), plus (+
), and optional (?
) - Handles character escaping with backslash
- Supports direct testing of regex patterns against input strings
- Includes a test framework for validating regex pattern matching
regex_to_postfix.py
: Converts infix regex notation to postfix (Polish) notationnfa_builder.py
: Implements Thompson's construction algorithm to build NFAsnfa_to_dfa.py
: Converts NFAs to DFAs using subset constructionmain.py
: CLI interface for testing regex patterns
Run the main script with a test file:
python main.py [test_file.json]
If no test file is provided, it will use the default test file.
The test file should be a JSON array of objects with the following structure:
[
{
"name": "Test Case 1",
"regex": "a(b|c)*",
"test_strings": [
{"input": "abcbc", "expected": true},
{"input": "ad", "expected": false}
]
}
]
The infix regex is first converted to postfix notation using the Shunting Yard algorithm, with explicit concatenation symbols added. Operator precedence is:
*, +, ?
(highest).
(concatenation)|
(alternation) (lowest)
The postfix regex is converted to an NFA using Thompson's construction algorithm. Each regex operator is handled by specific NFA constructions with ε-transitions used to combine sub-NFAs.
The NFA is converted to a DFA using the subset construction algorithm. This involves computing ε-closures and creating DFA states that represent sets of NFA states.