Open
Description
We want the ability to add your own tokenizer. This should permit very lightweight specifications but scale to really complex things. I envision the first step as generating a tokenizer based on the terminals that people use (#4) but it'd be nice to actually just permit tokenizer specifications as well, where people can write custom action code based on the strings that have been recognized.
Some things I think we'll want:
- A stack of states for the tokenizer to use
- The ability for action code to yield any number of tokens. I envision that we'll define a trait and require you to return some value that conforms to that trait. This trait will permit:
- return
Tok
for just one token. - if you write a return type of
()
, we expect you to return zero tokens. - if you write a return type of
(Tok, Tok)
, you always return two tokens. - if you write a return type of
Vec<Tok>
, we expect you to return a dynamic number of tokens. - internally, the generated code will keep a queue of generated tokens, and as tokens are requested they are removed from the front, and we only go back to the bytes when that queue is exhausted.
- return