-
Notifications
You must be signed in to change notification settings - Fork 299
Allow specifying conditions in external token patterns (contextual keywords) #966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree that this doesn't sound too hard to implement. I'm a little fuzzy on the motivating use case though. Why not just have the lexer return different token variants for these cases? I think @Pat-Lafon has more experience with custom lexers than I do, so hopefully he can chime in as well. |
I tried to track down why lalrpop/lalrpop/src/parser/lrgrammar.lalrpop Line 151 in 49b3c2c
I'm curious what other power this might add. Since this condition is probably arbitrary rust code, could the condition be used to access some arbitrary mutable global state? enum Token {
"(" => Token { kind: TokenKind::LParen, .. },
"blah" => Token { kind: TokenKind::Id, text } if External::IncrementandCheckNumberofBlah(&text),
...
} For some minor bikeshedding, I'm a little sad that the guard/condition goes after the (Some relevant stuff I found for my own reference: #112, #14, #10) |
I suspect the problem of contextual keywords at least is the following: for example, in the language I'm working on, However, I'm not sure to understand how the proposed feature could do anything about it - I feel like what you'd need is a guard on rules when matching said token, and not a guard at the token definition site? |
With the proposed feature, if you have a contextual keyword "or" like in your example, you can do
So the lexer keeps recognizing "or" as a variable instead of a keyword, but you define a token in the parser that checks the variable's text when matching an "or". Depending on your token definition you may be able to do this just with patterns, e.g.
But this doesn't work when the |
If I've understood the problem statement correctly, it sounds awfully similar to this recent PR adding a documentation example. In that situation, lexer behavior needed to be controlled by parser level information, and the example shows how parser context can be passed to the lexer. It seems like the "or" problem discussed here could be solved with that same technique. The lexer defines a mode type, and a If I've understood the problem correctly, I'm still confused as to how the guard approach proposed here addresses it. Aren't you just moving some lexer duties into the parser? It seems to me like the enum with the guard to lex to "or" when the string is "or", and variable otherwise is unconditional, right? So you've just moved the lexer duties into the parser, but not actually taken advantage of the parser's contextual awareness in making the lexing decision. Maybe there's a way to take advantage of the parsers contextual awareness using these guards, but I don't see it spelled out above, and it's not currently clear to me how that would work. If I've understood the problem correctly, and it can be solved using the example I mentioned above, I'm not necessarily opposed to also adding this lexer guard feature to provide another option. I'm just still not really understanding how the lexer guard feature adds value. |
@osa1 I'm sorry but I have trouble understanding your examples, because your token is already named "or" and thus the condition Do you actually mean something more like
Where the |
@yannham I'm using an external lexer, "or" is just the name I give to the pattern. So in this example: (from my previous comment)
I use "or" terminal to match tokens with pattern |
Ah, I see, thanks - I think I was reading that backward, as I haven't used such token definition in ages. Thanks for the clarification, it makes sense now. |
Currently an external token declaration like
Generates this pattern when matching the token:
I think we should be able to allow the user to specify the
if true
part, something like:Which would be compiled to
This can be used to handle contextual keywords. Currently we need to handle contextual keywords in the lexer, and then in the parser, when we expect a variable, handle contextual keywords and convert them to variables.
I think this should be fairly straightforward to implement. I can give it a try if maintainers think this can be added.
The text was updated successfully, but these errors were encountered: