8000 tokenizer support · Issue #10 · lalrpop/lalrpop · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
tokenizer support #10
Open
Open
@nikomatsakis

Description

@nikomatsakis

We want the ability to add your own tokenizer. This should permit very lightweight specifications but scale to really complex things. I envision the first step as generating a tokenizer based on the terminals that people use (#4) but it'd be nice to actually just permit tokenizer specifications as well, where people can write custom action code based on the strings that have been recognized.

Some things I think we'll want:

  • A stack of states for the tokenizer to use
  • The ability for action code to yield any number of tokens. I envision that we'll define a trait and require you to return some value that conforms to that trait. This trait will permit:
    • return Tok for just one token.
    • if you write a return type of (), we expect you to return zero tokens.
    • if you write a return type of (Tok, Tok), you always return two tokens.
    • if you write a return type of Vec<Tok>, we expect you to return a dynamic number of tokens.
    • internally, the generated code will keep a queue of generated tokens, and as tokens are requested they are removed from the front, and we only go back to the bytes when that queue is exhausted.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0