tokenizer support

We want the ability to add your own tokenizer. This should permit very lightweight specifications but scale to really complex things. I envision the first step as generating a tokenizer based on the terminals that people use (#4) but it'd be nice to actually just permit tokenizer specifications as well, where people can write custom action code based on the strings that have been recognized.

Some things I think we'll want:

A stack of states for the tokenizer to use
The ability for action code to yield any number of tokens. I envision that we'll define a trait and require you to return some value that conforms to that trait. This trait will permit:
- return Tok for just one token.
- if you write a return type of (), we expect you to return zero tokens.
- if you write a return type of (Tok, Tok), you always return two tokens.
- if you write a return type of Vec<Tok>, we expect you to return a dynamic number of tokens.
- internally, the generated code will keep a queue of generated tokens, and as tokens are requested they are removed from the front, and we only go back to the bytes when that queue is exhausted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions