8000 Precedence doesn't seem to work with macros · Issue #951 · lalrpop/lalrpop · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Precedence doesn't seem to work with macros #951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
stephanemagnenat opened this issue Sep 17, 2024 · 11 comments
Open

Precedence doesn't seem to work with macros #951

stephanemagnenat opened this issue Sep 17, 2024 · 11 comments

Comments

@stephanemagnenat
Copy link

Hello, I have a grammar for an expression-based language like this (simplified):

pub Expr = {
    ...
    // unary operators
    #[precedence(level="2")] #[assoc(side="right")]
    <o: Loc<"-">> <e:Expr> => ...,
   // binary multiplicative operators
    #[precedence(level="3")] #[assoc(side="left")]
    <l:Expr> <o:Loc<"*">> <r:Expr> => ...
    ...
}

and precedence works.

If now I use macros for unary and binary operations like this:

Unary<O>: (Span, Vec<Expr>) = {
    <o:Loc<O>> <e:Expr> => (o, vec![e]),
}

Binary<O>: (Span, Vec<Expr>) = {
    <l:Expr> <o:Loc<O>> <r:Expr> => (o, vec![l, r]),
}

and I modify my Expr to use them:

pub Expr = {
    ...
    #[precedence(level="2")] #[assoc(side="right")]
    Unary<"-"> => ...,
    #[precedence(level="3")] #[assoc(side="left")]
    Binary<"*"> => ...
    ...
}

I get a shift-reduce conflict:

    ... Local ambiguity detected
    The problem arises after having observed the following symbols in the input:
      Loc<"-"> Expr
    At that point, if the next token is a `"*"`, then the parser can proceed in
    two different ways.

    First, the parser could execute the production at
    ./src/parser.lalrpop:29:5: 29:14, which would consume the top 2 token(s) from
    the stack and produce a `Unary<"-">`. This might then yield a parse tree like
      Loc<"-">  Expr Loc<"*"> Expr
      ├─Unary<"-">─┤             │
      ├─Expr1──────┤             │
      ├─Expr───────┘             │
      └─Binary<"*">──────────────┘

    Alternatively, the parser could shift the `"*"` token and later use it to
    construct a `Loc<"*">`. This might then yield a parse tree like
      Expr "*"        ╷ Expr
      │    └─Loc<"*">─┘    │
      └─Binary<"*">────────┘

I tried with #[inline] on the macros but it didn't help. So it seems to me that using macros breaks the precedence, which is a bit surprising as I would assume that given their names macros would just expand and respect the precedence annotations.

So my questions are:

  • Is that the expected behavior?
  • If so, maybe a note in the documentation of full expressions here and/on macro here would be helpful.
  • Is there a known work-around besides manually expanding the macros?
@stephanemagnenat stephanemagnenat changed the title Precedence doesn't seem to work with macro Precedence doesn't seem to work with macros Sep 17, 2024
@stephanemagnenat
Copy link
Author

It seems that #[inline] alse doesn't work with precedence. For example, if I want to put my unary expression in a new rule:

#[inline]
UnaryExpr: Expr = {
    "-" <e:Expr> => ...
}

and use this one in Expr:

pub Expr = {
    ...
    #[precedence(level="2")] #[assoc(side="right")]
    UnaryExpr,
    #[precedence(level="3")] #[assoc(side="left")]
    Binary<"*"> => ...
    ...
}

I get a similar shift-reduce conflict.

@stephanemagnenat
Copy link
Author

I wonder whether the last comment in #667 about macro substitution in relation with precedence relates to the same issue mentioned here.

@yannham
Copy link
Contributor
yannham commented Sep 18, 2024

Argh, I entirely forgot about #667 to be honest. I couldn't invest enough time into it then. IIRC, it's very likely that precedence annotations are substituted before macro expansion, which would fit your observations. One possibility might be to perform elaboration after macro expansion, maybe this is the simplest approach (no need to handle issues with name capture as in #667). Or the #667 approach is another direction.

I can't promise anything, but I'll try to give it a look when I can.

@stephanemagnenat
Copy link
Author

Thank you for your answer! Indeed we certainly do not need the full of #667 to fix this issue.

@yannham
Copy link
Contributor
yannham commented Nov 14, 2024

@stephanemagnenat Sorry I took so long, but for some reason this issue came back to my mind as I'm working on a grammar that makes use of macro + precedence. I think they can both work together quite well, but needs a tweak:

Unary<O, Rule>: (Span, Vec<Rule>) = {
    <o:Loc<O>> <e:ExprRule> => (o, vec![e]),
}

Binary<O, Left, Right>: (Span, Vec<Left>) = { // Left or Right will do in the Vec; we assume the type is the same
    <l: Left> <o:Loc<O>> <r: Right> => (o, vec![l, r]),
}

pub Expr = {
    ...
    #[precedence(level="2")] #[assoc(side="right")]
    Unary<"-", Expr> => ...,
    #[precedence(level="3")] #[assoc(side="left")]
    Binary<"*", Expr, Expr> => ...
    ...
}

Explanation

Before precedence annotation existed

To understand why, you'll have to understand how precedence and annotation rewriting works. Before precedence annotations were introduced, you would have to manually define some levels and carefully make them reference each other in the right order. Taking a slightly modified version of the calculator example from the manual:

use std::str::FromStr;

grammar;

// Alias to avoid having to use the `2` from other rules elsewhere
pub Expr: i32 = Expr2;

pub Expr2: i32 = {
    <l:Expr2> "+" <r:Expr1> => l + r,
    <l:Expr2> "-" <r:Expr1> => l - r,
    Expr1,
};

Expr1: i32 = {
    <l:Expr1> "*" <r:Expr2> => l * r,
    <l:Expr1> "/" <r:Expr2> => l / r,
    Term,
};

Term: i32 = {
    Num,
    "(" <Expr2> ")",
};

Num: i32 = {
    r"[0-9]+" => i32::from_str(<>).unwrap(),
};

Now

This can be more compactly rewritten to the following using annotations:

pub Expr: i32 = {
    #[precedence(level="0")] // Highest precedence
    Term,
    #[precedence(level="1")] #[assoc(side="left")]
    <l:Expr> "*" <r:Expr> => l * r,
    <l:Expr> "/" <r:Expr> => l / r,
    #[precedence(level="2")] #[assoc(side="left")]
    <l:Expr> "+" <r:Expr> => l + r,
    <l:Expr> "-" <r:Expr> => l - r,
};

The gist is that LALRPOP actually rewrites the second example to the first example, prior to macro expansion. In particular, you can't just take one rule outside of Expr as follows:

Mult : i32 = <l:Expr> "*" <r:Expr> => l * r;

pub Expr: i32 = {
    // ...
     #[precedence(level="1")] #[assoc(side="left")]
    Mult,
   // ...
};

Because after rewriting, external references to Expr from Mult will be replaced with Exp2, the latest level of the rewritten grammar, while the explicit level version used Expr1 or Expr2. Those levels aren't accessible explicitly right now from other rules; the only way to access them is to write an explicit Expr occurrence that will be substituted in the Expr rule. So we can try to use a macro:

Mult<ExprN> : i32 = <l: ExprN> "*" <r: ExprN> => l * r;

pub Expr: i32 = {
    #[precedence(level="0")] // Highest precedence
    Term,
    #[precedence(level="1")] #[assoc(side="left")]
    Mult<Expr>,
    <l:Expr> "/" <r:Expr> => l / r,
    #[precedence(level="2")] #[assoc(side="left")]
    <l:Expr> "+" <r:Expr> => l + r,
    <l:Expr> "-" <r:Expr> => l - r,
};

This works better: the Expr in Mult<Expr>, being an occurrence of Expr within itself, will be substituted by the precedence transformation. It's still not what we want: in the original, inlined version, both arguments actually end up with different rules, because of associativity. But since the macro has only one argument, both ExprN are necessarily the same after macro expansion, so you end up with something like <l: Expr1> "*" <r: Expr1>.

The trick is thus to have two independent occurrences of ExprN in the macro as well:

Mult<ExprLeft, ExprRight> : i32 = <l: ExprLeft> "*" <r: ExprRight> => l * r,

pub Expr: i32 = {
    #[precedence(level="0")] // Highest precedence
    Term,
    #[precedence(level="1")] #[assoc(side="left")]
    Mult<Expr, Expr>,
    <l:Expr> "/" <r:Expr> => l / r,
    #[precedence(level="2")] #[assoc(side="left")]
    <l:Expr> "+" <r:Expr> => l + r,
    <l:Expr> "-" <r:Expr> => l - r,
};

Now, the precedence transformation will see two occurrences in Mult<Expr, Expr> in Expr and rewrite them taking associativity into account to the right level, giving the desired <l: Expr1> "*" <r: Expr1>.

Summary

Precedence and associativity works through a rewriting process. This process rewrites the self referential occurrences of a rule Rule within itself, by:

  1. Split Rule into anonymous, intermediate levels Rule1, ..., RuleN as needed. The last level is called Rule and is the only one that can be referred to explicitly; if you use Rule in some other rule, this will reference the last level.
  2. Put each alternative at the right level and substitute their self-referential occurrences of Rule with the right levels, according to precedence and associativity. In particular, several occurrences of Rule, while seemingly the same, might end up being instantiated with different rules after the transformation.

This is why some seemingly valid refactoring don't work. Once you aren't in Expr anymore, any reference to Expr will always refer to the last level. If you want to split your stuff into simpler rules, you need to take Expr as a parameter, and if the latter appears several times in your subrule, you probably want to take as many identical parameters because associativity might rewrite them to different things.

@dburgener
Copy link
Contributor

Thanks for the detailed write-up!

Would switching the order of precedence and macro resolution fix this? Swapping their order does pass the current test suite, although I'm not sure how comprehensive our test coverage of precedence and macro interaction is. In a fairly short amount of time thinking about it, I can't think of any new issues that the switch would introduce, but that's not something I'm super confident in.

@yannham
Copy link
Contributor
yannham commented Nov 15, 2024

Hmm, this is a good question. I don't know how #[inline] is handled (as part of macro expansion, as a zero-ary macro? Or at another stage of the pipeline). For macros, on the top of my head, I think your proposal is a reasonable variant as well. When I implemented precedence, I don't remember having a very strong reason to do it before macro expansion, and both solutions look acceptable IMHO.

Changing the order would solve the artificial need of having macros that are "linear" (in the sense of substructural logics: each argument is used exactly once), and would be backward compatible with the previous practice (that wouldn't mess up such linear macros, as the result would be the same). If you used a non-linear macro that duplicates its argument in a block with precedence annotations on purpose (albeit I'm not sure why anyone would want to do that) to avoid multiple substitution, I think you can always recover the old behavior by setting assoc=none (or I don't remember the syntax, but there's definitely a "non assoc" value that substitutes all occurrences with the same rule).

Hence, my first impression is that swapping the order of macro resolution and precedence would be ok.

@yannham
Copy link
Contributor
yannham commented Nov 15, 2024

Also, we could mention #667, which wants precedence annotations to work within macros. Your proposal would probably be incompatible with that (as first expanding a macro and then substituting would be quite different that first substituting and then doing macro expansion, if the macro is used in a block with precedence annotations). That being said, I'm not sure we want to do #667, and I'm not sure about its original motivation either.

@dburgener
Copy link
Contributor

I don't know how #[inline] is handled (as part of macro expansion, as a zero-ary macro?

Inlining is its own step in normalizing, after both precedence and macro expansion

Also, we could mention #667, which wants precedence annotations to work within macros. Your proposal would probably be incompatible with that (as first expanding a macro and then substituting would be quite different that first substituting and then doing macro expansion, if the macro is used in a block with precedence annotations). That being said, I'm not sure we want to do #667, and I'm not sure about its original motivation either.

This is a good point. In the abstract, letting users do precedence inside macros seems desirable, from a "least surprise" principle (ie Is there a reason users should expect precedence to not work inside a macro def?). I guess your hesitation is the practical points about how the two different rewrites interact? (And particular the associativity point raise in #667)

Ideally, I'd think we'd like to ultimately get to a place where the original code here works and precedence works inside macros. I don't really have my head fully enough around the precedence rewriting at this point to reason too deeply about that though.

BTW, poking around at the normalize code, I did discover that we do have a test for the sort of macro/precedence interaction with the extra args that you mentioned above. If we end up making changes in this area, we should probably expand our test coverage some more (eg a self-contained version of @stephanemagnenat's example in this issue).

@yannham
Copy link
Contributor
yannham commented Nov 15, 2024

Ideally, I'd think we'd like to ultimately get to a place where the original code here works and precedence works inside macros. I don't really have my head fully enough around the precedence rewriting at this point to reason too deeply about that though.

If we want that, then playing with the order of the two phases won't be enough. We would need to:

  1. Rewrite/Elaborate precedence annotations in the body of macros
  2. Perform macro expansion
  3. Rewrite/Elaborate precedence annotations in the rest of the rules

However, I find both approaches (making this issue work and precedence annotation works in macro) to be somehow opposed, at least on a philosophical level.

To make the present issue work, the solution is perform macro expansion first then precedence elaboration. The philosophy could thus be summed up as macro are substituted verbatim before precedence elaboration takes place. Following this simple semantics, we might expect that the last example could be rewritten as:

MultLike<ExprLeft, ExprRight> : i32 = {
    #[precedence(level="1")] #[assoc(side="left")]
    <l: ExprLeft> "*" <r: ExprRight> => l * r,
    <l: ExprLeft> "/" <r: ExprRight> => l / r,
};

pub Expr: i32 = {
    #[precedence(level="0")] // Highest precedence
    Term,
    MultLike<Expr, Expr>,
    <l:Expr> "/" <r:Expr> => l / r,
    #[precedence(level="2")] #[assoc(side="left")]
    <l:Expr> "+" <r:Expr> => l + r,
    <l:Expr> "-" <r:Expr> => l - r,
};

Indeed, if macro expansions take place before, then we could expect that the precedence annotation is inline into Expr and only takes effect there. On the other hand, supporting precedence in macros is the dual philosophy: first elaborate, then perform macro expansion. This would give an entirely different meaning to the example above.

Supporting the two feels a bit like an unnatural mix, which reflects in the fact that you can't express the transformation pipeline with two nicely separated and non-interfering phases, but need to interleave them somehow.

We can still do it. And if we want to choose one or the other, I don't have a strong opinion either way - but in general when you need to perform case analysis to explain the semantics or the interaction of some constructs, it's a sign that said semantics might be a bit too ad-hoc IMHO. Or at least, it needs to be strongly motivated by practical examples both ways.

@dburgener
Copy link
Contributor

Yeah, I think that way of thinking about it makes a ton of sense.

I guess in that case, at least a helpful error message on the case(s) we don't support would be helpful. That leaves the question open of which (if any) of these two use cases we do want to support. I'm not immediately sure on that, and I haven't looked at #667 much. I'll try to give it some thought and develop a personal opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0