-
Notifications
You must be signed in to change notification settings - Fork 299
FR: Documentation Clarification re: logos #905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hmm, I may need some more clarification on what the issue is here. For example, what is the compilation error you are hitting? What is the diff you made to the lexer example( I tried to play around with the lexer example by removing the callback for diff --git a/doc/lexer/src/grammar.lalrpop b/doc/lexer/src/grammar.lalrpop
index fa3d6d7..3852407 100644
--- a/doc/lexer/src/grammar.lalrpop
+++ b/doc/lexer/src/grammar.lalrpop
@@ -1,7 +1,7 @@
use crate::tokens::{Token, LexicalError};
use crate::ast;
-grammar;
+grammar<'input>;
// ...
@@ -9,10 +9,10 @@ extern {
type Location = usize;
type Error = LexicalError;
- enum Token {
+ enum Token<'input> {
"var" => Token::KeywordVar,
"print" => Token::KeywordPrint,
- "identifier" => Token::Identifier(<String>),
+ "identifier" => Token::Identifier(<&'input str>),
"int" => Token::Integer(<i64>),
"(" => Token::LParen,
")" => Token::RParen,
@@ -32,7 +32,7 @@ pub Script: Vec<ast::Statement> = {
pub Statement: ast::Statement = {
"var" <name:"identifier"> "=" <value:Expression> ";" => {
- ast::Statement::Variable { name, value }
+ ast::Statement::Variable {name: name.to_string(), value }
},
"print" <value:Expression> ";" => {
ast::Statement::Print { value }
@@ -81,7 +81,7 @@ pub Term: Box<ast::Expression> = {
Box::new(ast::Expression::Integer(val))
},
<name:"identifier"> => {
- Box::new(ast::Expression::Variable(name))
+ Box::new(ast::Expression::Variable(name.to_string()))
},
"(" <e:Expression> ")" => e
}
\ No newline at end of file
diff --git a/doc/lexer/src/lexer.rs b/doc/lexer/src/lexer.rs
index d77b015..722fefa 100644
--- a/doc/lexer/src/lexer.rs
+++ b/doc/lexer/src/lexer.rs
@@ -6,7 +6,7 @@ pub type Spanned<Tok, Loc, Error> = Result<(Loc, Tok, Loc), Error>;
pub struct Lexer<'input> {
// instead of an iterator over characters, we have a token iterator
- token_stream: SpannedIter<'input, Token>,
+ token_stream: SpannedIter<'input, Token<'input>>,
}
impl<'input> Lexer<'input> {
@@ -19,7 +19,7 @@ impl<'input> Lexer<'input> {
}
impl<'input> Iterator for Lexer<'input> {
- type Item = Spanned<Token, usize, LexicalError>;
+ type Item = Spanned<Token<'input>, usize, LexicalError>;
fn next(&mut self) -> Option<Self::Item> {
self.token_stream
diff --git a/doc/lexer/src/tokens.rs b/doc/lexer/src/tokens.rs
index a11b127..7c2e024 100644
--- a/doc/lexer/src/tokens.rs
+++ b/doc/lexer/src/tokens.rs
@@ -17,14 +17,14 @@ impl From<ParseIntError> for LexicalError {
#[derive(Logos, Clone, Debug, PartialEq)]
#[logos(skip r"[ \t\n\f]+", skip r"#.*\n?", error = LexicalError)]
-pub enum Token {
+pub enum Token<'a> {
#[token("var")]
KeywordVar,
#[token("print")]
KeywordPrint,
- #[regex("[_a-zA-Z][_0-9a-zA-Z]*", |lex| lex.slice().to_string())]
- Identifier(String),
+ #[regex("[_a-zA-Z][_0-9a-zA-Z]*")]
+ Identifier(&'a str),
#[regex("[1-9][0-9]*", |lex| lex.slice().parse())]
Integer(i64),
@@ -47,7 +47,7 @@ pub enum Token {
OperatorDiv,
}
-impl fmt::Display for Token {
+impl<'a> fmt::Display for Token<'a> {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{:?}", self)
} |
I'll close this issue. Let me know if there is more here and we can reopen |
Thanks for the reply. The thing I am looking for here is the ability to access the slice for a given token, as opposed to storing the slice in the token itself. This allows for nice tests with logos, like the following: #[derive(Logos, Debug, PartialEq, Clone)]
pub enum Token {
#[regex(r"\(\*[a-zA-Z0-9 ]*\*\)")]
BlockComment,
#[regex("[0-9]+")]
Int,
#[regex(r"[0-9]+\.[0-9]*")]
Float,
#[regex(r#""[^"]*""#)]
String_,
Extra,
} #[cfg(test)]
mod tests {
use logos::Logos;
use crate::tokens::Token;
use crate::tokens::Token::{BlockComment, Float, Int};
fn assert_tokens(s: &str, tokens: Vec<Token>) {
let mut lex = Token::lexer(s);
<
8000
span class="pl-k">let tokens_ = lex.into_iter().map(|v| {
if v.is_err() {
eprintln!("{:#?}", v);
}
v.unwrap_or(Token::Extra)
}).collect::<Vec<Token>>();
assert_eq!(tokens_, tokens)
}
#[test]
fn int() {
assert_tokens("123", vec![Int])
}
#[test]
fn float() {
assert_tokens("123.0", vec![Float]);
assert_tokens("123.", vec![Float])
}
#[test]
fn block_comment() {
assert_tokens("(* hello *)", vec![BlockComment]);
assert_tokens("(* hello world *)", vec![BlockComment]);
assert_tokens("(* hello world 123 *)", vec![BlockComment])
}
#[test]
fn string() {
assert_tokens(r#""hello""#, vec![Token::String_])
}
} Back in my Do we know if there might be another way to approach this? |
Related #803 |
I got this working with the following: use crate::ast::Ast;
use crate::ast::f64;
use crate::tokens::Token;
use crate::tokens::Parse;
use crate::tokens;
grammar<'input>;
pub Exprs: Vec<Ast> = {
<v:(<Expr>)*> => v,
}
pub Expr: Ast = {
<info: @L> <val:"int"> => Ast::Int(<tokens::Token as Parse<i64>>::parse(&val, info.1)),
<info: @L> <val:"float"> => Ast::Float(<tokens::Token as Parse<f64>>::parse(&val, info.1)),
<info: @L> <val:"string"> => Ast::String_(<tokens::Token as Parse<String>>::parse(&val, info.1)),
}
extern {
type Location = (usize, &'input str);
type Error = ();
enum Token {
"int" => Token::Int,
"float" => Token::Float,
"string" => Token::String_,
}
} Besides the gymnastics of |
One of the things that comes to mind, which is an FR, and probably a major FR: // not currently a feature
extern {
enum Token {
"int" => (Token::Int, String)
}
}
grammar<'input>;
pub Exprs: Vec<Ast> = {
<v:(<Expr>)*> => v,
}
pub Expr: Ast = {
<val:"int"> => Ast::Int(<tokens::Token as Parse<i64>>::parse(&val.0, val.1)),
...
} This could be cool, but I'm not sure of use-cases beyond this. I could also certainly refactor my test code for logos to get something like I have, instead of expecting this from logos (though it would be cool, for this case, anyways). |
Another thing I could think of that could help here is being able to access the lexer inline in the parser -- e.g. in the case of a Again, a FR, it might look something like this: // not a feature right now
<val:"int"> => Ast::Int(<tokens::Token as Parse<i64>>::parse(&val, $lexer.slice())), |
Uh oh!
There was an error while loading. Please reload this page.
tl;dr: Is it possible to define a tokenizer that does not require a callback using logos with lalrpop?
In this tutorial http://lalrpop.github.io/lalrpop/lexer_tutorial/005_external_lib.html, a token for lexing identifiers is declared:
in addition to a parser that identifies lexed identifiers:
What I am noticing is that if a callback is not offered to logos'
regex
macro,name
in the parser binds the token itself, as opposed to its value. But offering a callback is not required – in theory – becausea token returned by lgoos' lexer includes its lexed textthe lexer can return the slice a given token matches. For example:Is it possible to define a tokenizer that does not require a callback using logos with lalrpop?
The text was updated successfully, but these errors were encountered: