-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Implement collecting errors while tokenizing #2911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -41,37 +41,42 @@ fn reparse_token<'node>( | |||
root: &'node SyntaxNode, | |||
edit: &AtomTextEdit, | |||
) -> Option<(GreenNode, TextRange)> { | |||
let token = algo::find_covering_element(root, edit.delete).as_token()?.clone(); | |||
match token.kind() { | |||
let prev_token = algo::find_covering_element(root, edit.delete).as_token()?.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spent some time figuring out what is going on in this method to properly integrate changes from lexer. The core implementation here didn't change, just renamed some variables for clarity...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good names do a lot for code readability 👍
@@ -97,6 +102,9 @@ fn reparse_block<'node>( | |||
fn get_text_after_edit(element: SyntaxElement, edit: &AtomTextEdit) -> String { | |||
let edit = | |||
AtomTextEdit::replace(edit.delete - element.text_range().start(), edit.insert.clone()); | |||
|
|||
// Note: we could move this match to a method or even further: use enum_dispatch crate | |||
// https://crates.io/crates/enum_dispatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my proposal for the future refactoring, want to hear your opinion on it.
Honestly, I just liked the idea of static enum dispatch that relives from making object-safe traits and replaces dynamic dispatch.
It even reduces the boilerplate of such matches.
Though you may consider this to be unnecessary overhead and I won't argue too much)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's some consensus that we already have a bit too many dependencies, so adding another one, especially a procedural macro, is probably not a good thing if we want to keep build times down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually feel pretty strongly about not using "helper" creates, that try to bolt on idioms onto a language. I feel that the accidental complexity introduced by them is almost always larger than any boilerplate savings. The notable exception is the impl_froms
macro which we use throughout the hir, and which I feel is justified, in that it's a very simple idea, and the amount of boilerplate saved is significant.
The reason why we need an explicit match here is that the .text()
are two very different things in this case: the first one is a &str
, and the second-one is a tree structure -- a view into the node. If we had a generic trait Text
, then making an type NodeOrTokenText<'a> = NodeOrToken<SyntaxText, &str>; impl Text for NodeOrTokenText<'a>
would be reasonable, but we don't have such a trait yet, and designing it right is really hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, guys, thank you for the opinion)
// FIXME: Location should be just `Location(TextRange)` | ||
// TextUnit enum member just unnecessarily compicates things, | ||
// we should'n treat it specially, it just as a `TextRange { start: x, end: x + 1 }` | ||
// see `location_to_range()` in ra_ide/src/diagnostics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be the target for the future refactoring. I am not entirely sure why we don't use just TextRange
as a location of SyntaxError
instead. Or at least make location a newtype of TextRange
? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable. In the end, when used to emit a diagnostic, it gets turned into a TextRange anyways: https://github.com/rust-analyzer/rust-analyzer/blob/d1330a4a65f0113c687716a5a679239af4df9c11/crates/ra_ide/src/diagnostics.rs#L118-L123
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this should just be a TextRange
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I just referred to that snippet in the comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That you did 😆
crates/ra_syntax/src/syntax_error.rs
Outdated
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
let msg = match self { | ||
TokenizeError::EmptyInt => "Missing digits after integer base prefix", | ||
TokenizeError::EmptyExponent => "Missing digits after the exponent symbol", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish rustfmt
was not so clippy, initialy I wrote this match with each arm homogeneously and nicely wrapped into curly braces, but the formatting tool demands the shorthand syntax here ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could always #[rustfmt::skip]
this one function. We do the same for a few match statements elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with sticking #[rustfmt::skip]
on top in similar cases.
pub struct SyntaxTreeBuilder { | ||
errors: Vec<SyntaxError>, | ||
inner: GreenNodeBuilder<'static>, | ||
} | ||
|
||
impl Default for SyntaxTreeBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why Default
trait was implemented here by hand, could you, please, elabortate @matklad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time this was written, I forgot to add a Default
to GreenBuilder
, and was to lazy to publish a new version of rowan just for that: rust-analyzer/rowan@a3692a9#diff-1a2b23ceb6e534bd16f63d2e55e537aaR154.
In general, if something looks like it doesn't make sense, it probably indeed doesn't make sense :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've stumbled with enough amount of things that didn't make sense to me but did actually exist for a reason, so I'd rather ask the author first before amending them. Anyway, thank you for the clarification!
I know that |
crates/ra_syntax/src/tests.rs
Outdated
@@ -10,7 +10,8 @@ use crate::{fuzz, SourceFile}; | |||
#[test] | |||
fn lexer_tests() { | |||
dir_tests(&test_data_dir(), &["lexer"], |text, _| { | |||
let tokens = crate::tokenize(text); | |||
// FIXME: add tests for errors (their format is up to discussion) | |||
let tokens = crate::tokenize(text).tokens; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to make these tests data-driven, but I am not sure which format would be the most suitable...
Your input is very important here, guys!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably just update dump_tokens
to take ParsedTokens
and then also output the errors like we do the tokens. If you don't print anything when there's no errors, you shouldn't even need to touch the existing tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just dumping errors in dump_tokens
after the tokens themselves seems good to me.
Unrelated, but perhaps a slightly more information rich way to write this would be:
let ParsedToken { tokens, errors: _ } = crate::tokenize(text)
Ie, explicitely naming the thing you are ignoring (which usually is important for erroes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay guys, will do!
Also, @matklad, good point on explicit errors dropping. Do you think we should use this pattern in all places tokenize*()/single_token()/first_token()
are called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so! In general, I just always try to destruct tuples that way:
13:49:36|~/projects/rust-analyzer|HEAD✓
λ rg 'let .*, _\w+\)'
xtask/src/codegen/gen_syntax.rs
167: let punctuation_values = grammar.punct.iter().map(|(token, _name)| {
crates/ra_hir_def/src/generics.rs
59: let (params, _source_map) = GenericParams::new(db, def.into());
crates/ra_hir/src/code_model.rs
1072: let (adt, _subst) = self.ty.value.as_adt()?;
crates/ra_batch/src/lib.rs
53: let (crate_graph, _crate_names) =
147: let (host, _roots) = load_cargo(path).unwrap();
crates/ra_hir_ty/src/utils.rs
141: let (_total, parent_len, _child) = self.len_split();
crates/ra_parser/src/grammar/expressions.rs
27: let (cm, _block_like) = expr(p);
crates/ra_hir_def/src/nameres/collector.rs
947: let (db, _file_id) = TestDB::with_single_file(&code);
crates/ra_hir_ty/src/infer/coerce.rs
253: let (last_field_id, _data) = fields.next_back()?;
crates/ra_parser/src/grammar/expressions/atom.rs
560: let (completed, _is_block) =
``
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good. A few small nits to the actual code, but nothing major.
match single_token(new_name)?.token.kind { | ||
SyntaxKind::IDENT | SyntaxKind::UNDERSCORE => (), | ||
_ => return None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love how this turned out 👍
/// In general `self.errors.len() <= self.tokens.len()` | ||
pub errors: Vec<SyntaxError>, | ||
} | ||
impl ParsedTokens { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Add a blank line above this.
.map(|error| SyntaxError::new(SyntaxErrorKind::TokenizeError(error), token_range)), | ||
}; | ||
|
||
type ParsedSyntaxKind = (SyntaxKind, Option<TokenizeError>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining the alias here probably takes up more space that just writing it out four times. Don't really care too much either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll try to refactor it according to this comment
@@ -41,37 +41,42 @@ fn reparse_token<'node>( | |||
root: &'node SyntaxNode, | |||
edit: &AtomTextEdit, | |||
) -> Option<(GreenNode, TextRange)> { | |||
let token = algo::find_covering_element(root, edit.delete).as_token()?.clone(); | |||
match token.kind() { | |||
let prev_token = algo::find_covering_element(root, edit.delete).as_token()?.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good names do a lot for code readability 👍
@@ -97,6 +102,9 @@ fn reparse_block<'node>( | |||
fn get_text_after_edit(element: SyntaxElement, edit: &AtomTextEdit) -> String { | |||
let edit = | |||
AtomTextEdit::replace(edit.delete - element.text_range().start(), edit.insert.clone()); | |||
|
|||
// Note: we could move this match to a method or even further: use enum_dispatch crate | |||
// https://crates.io/crates/enum_dispatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's some consensus that we already have a bit too many dependencies, so adding another one, especially a procedural macro, is probably not a good thing if we want to keep build times down.
// FIXME: it seems this initialization statement is unnecessary (see edit in outer scope) | ||
// Investigate whether it should really be removed. | ||
let edit = AtomTextEdit { delete: range, insert: replace_with.to_string() }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither range
nor replace_with
change, and edit
is always passed as reference, so I'm inclined to agree that this line could probably be removed.
// FIXME: Location should be just `Location(TextRange)` | ||
// TextUnit enum member just unnecessarily compicates things, | ||
// we should'n treat it specially, it just as a `TextRange { start: x, end: x + 1 }` | ||
// see `location_to_range()` in ra_ide/src/diagnostics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable. In the end, when used to emit a diagnostic, it gets turned into a TextRange anyways: https://github.com/rust-analyzer/rust-analyzer/blob/d1330a4a65f0113c687716a5a679239af4df9c11/crates/ra_ide/src/diagnostics.rs#L118-L123
crates/ra_syntax/src/syntax_error.rs
Outdated
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
let msg = match self { | ||
TokenizeError::EmptyInt => "Missing digits after integer base prefix", | ||
TokenizeError::EmptyExponent => "Missing digits after the exponent symbol", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could always #[rustfmt::skip]
this one function. We do the same for a few match statements elsewhere.
// FIXME: the obvious pattern of this enum dictates that the following enum variants | ||
// should be wrapped into something like `SemmanticError(SemmanticError)` | ||
// or `ValidateError(ValidateError)` or `SemmanticValidateError(...)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets leave it at the tokenizer changes, at least for this PR. Then you can do this as a separate PR if it seems useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I described the summary and future todos in the comment under the original issue. I'll see what should be done in next PR.
crates/ra_syntax/src/tests.rs
Outdated
@@ -10,7 +10,8 @@ use crate::{fuzz, SourceFile}; | |||
#[test] | |||
fn lexer_tests() { | |||
dir_tests(&test_data_dir(), &["lexer"], |text, _| { | |||
let tokens = crate::tokenize(text); | |||
// FIXME: add tests for errors (their format is up to discussion) | |||
let tokens = crate::tokenize(text).tokens; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably just update dump_tokens
to take ParsedTokens
and then also output the errors like we do the tokens. If you don't print anything when there's no errors, you shouldn't even need to touch the existing tests.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a bunch of nitpicky comments!
I don't really have a strong opinions on them, but, as this is partially a refactoring work, I'd like to pay more than usual attention to detail, to make sure we have a somewhat shared understanding what's the "rust-analyzer" style is
Overall, LGTM!
pub error: Option<TokenizeError>, | ||
} | ||
impl ParsedToken { | ||
pub const fn new(token: Token, error: Option<TokenizeError>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually use this function? In general, I try to avoid adding trivial new
s for structs where all fields are public, as a struct literal is more readable at the call-site (as fields have names). The exception is if the struct is created in a lot of different places, where the method call might be more concise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly for the Token
above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, in the case of ParsedToken
, I would probably use type ParsedToekn = (Token, Option<TokenizeError>)
, because semantically it is just a pair of a token and error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't expect you would dig into commits, I did rethink this and removed that functions in later commits.
// We drop some useful infromation here (see patterns with double dots `..`) | ||
// Storing that info in `SyntaxKind` is not possible due to its layout requirements of | ||
// being `u16` that come from `rowan::SyntaxKind` type and changes to `rowan::SyntaxKind` | ||
// would mean hell of a rewrite. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't do this not because we are lazy to rewrite (I think the syntax three were rewriten something like four times?), but because dumb semantic-less nodes are an explicit design decision :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think I did make this comment a bit misleading, I just wanted to explain why we do drop some info here, not proposing to change the design!
TK::Caret => ok(CARET), | ||
TK::Percent => ok(PERCENT), | ||
TK::Unknown => ok(ERROR), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably avoid ok
, ok_if
functions and do something like this
let kind = match {
TK::LineComment => COMMENT,
TK::BlockComment { terminated: false } => return (COMMENT, Some(TE::UnterminatedBlockComment)),
TK::BlockComment { terminated: true } => COMMENT,
};
(kind, None)
That is,
- move
ok
to a happy path, and usereturn
for errors - pull
TextUnit::from_usize(token_text.len())
bit out of the function, as it doesn't depend on on the specific kind at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, nice notice, I'll apply this
pub fn first_token(text: &str) -> Option<ParsedToken> { | ||
// Checking for emptyness because of `rustc_lexer::first_token()` invariant (see its body) | ||
if text.is_empty() { | ||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None | |
return None; |
and remove else
. We use early returns by default.
@@ -46,8 +46,7 @@ fn reparse_token<'node>( | |||
WHITESPACE | COMMENT | IDENT | STRING | RAW_STRING => { | |||
if token.kind() == WHITESPACE || token.kind() == COMMENT { | |||
// removing a new line may extends previous token | |||
if token.text().to_string()[edit.delete - token.text_range().start()].contains('\n') | |||
{ | |||
if token.text()[edit.delete - token.text_range().start()].contains('\n') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -84,6 +84,9 @@ pub enum SyntaxErrorKind { | |||
ParseError(ParseError), | |||
EscapeError(EscapeError), | |||
TokenizeError(TokenizeError), | |||
// FIXME: the obvious pattern of this enum dictates that the following enum variants | |||
// should be wrapped into something like `SemmanticError(SemmanticError)` | |||
// or `ValidateError(ValidateError)` or `SemmanticValidateError(...)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in general, I am not sure that using an enum
for SyntaxErrors is a good idea. We don't really match on them, so perhaps just an OpaqueError(String, TextRange)
would be a better representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's leave it for the refactoring to be done soon. Since this crate is called ra_syntax
let's leave SyntaxError
name for that and not some other OpaqueError
.
If this crate is used only by rust-analyzer exclusively I am definitely for generic (String, TextRange)
error shape.
crates/ra_syntax/src/syntax_error.rs
Outdated
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
let msg = match self { | ||
TokenizeError::EmptyInt => "Missing digits after integer base prefix", | ||
TokenizeError::EmptyExponent => "Missing digits after the exponent symbol", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with sticking #[rustfmt::skip]
on top in similar cases.
pub struct SyntaxTreeBuilder { | ||
errors: Vec<SyntaxError>, | ||
inner: GreenNodeBuilder<'static>, | ||
} | ||
|
||
impl Default for SyntaxTreeBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time this was written, I forgot to add a Default
to GreenBuilder
, and was to lazy to publish a new version of rowan just for that: rust-analyzer/rowan@a3692a9#diff-1a2b23ceb6e534bd16f63d2e55e537aaR154.
In general, if something looks like it doesn't make sense, it probably indeed doesn't make sense :)
crates/ra_syntax/src/tests.rs
Outdated
@@ -10,7 +10,8 @@ use crate::{fuzz, SourceFile}; | |||
#[test] | |||
fn lexer_tests() { | |||
dir_tests(&test_data_dir(), &["lexer"], |text, _| { | |||
let tokens = crate::tokenize(text); | |||
// FIXME: add tests for errors (their format is up to discussion) | |||
let tokens = crate::tokenize(text).tokens; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just dumping errors in dump_tokens
after the tokens themselves seems good to me.
Unrelated, but perhaps a slightly more information rich way to write this would be:
let ParsedToken { tokens, errors: _ } = crate::tokenize(text)
Ie, explicitely naming the thing you are ignoring (which usually is important for erroes).
crates/ra_syntax/src/syntax_error.rs
Outdated
} | ||
TokenizeError::UnstartedRawString => { | ||
"Missing `\"` symbol after `#` symbols to begin the raw string literal" | ||
"Missing \" symbol after # symbols to begin the raw string literal" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually like to place some kind of quotes around snippets because that's helpful when a snippet contains trailing/leading whitespace or other invisible characters. But we should just stick to what rustc is doing.
Which... I am not sure what is exactly! For this code
λ cat main.rs
fn main() {
let s = r###;
}
I get this error
Note how #
is uselessly in quotes, but the empty string between :
and ;
isn't.
@estebank what are the current guidelines around quotes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, I initially wrapped the symbols into backticks (hoping they would be rendered as Markdown), but once I started VSCode and checked out that the messages were displayed as a raw text I removed them 34d72aa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to consistently surround code and tokens (including identifiers) in backticks. No tool is actually parsing them, but having them bare makes it harder to read (consider
found , but expected .vs
found `,` but expected `.`)
Thank you for the review and elaborating on the "rust-analyzer" style, will take all that into account! |
is this still a wip? |
@matklad please don't merge yet, I'll add tests today. Then I'll remove WIP label. Or do you have another understanding of WIP label? |
Sometimes wip lables are left there accidentally, just checking that that's
not the case :)
…On Tue, 28 Jan 2020 at 18:08, Veetaha ***@***.***> wrote:
@matklad <https://github.com/matklad> please don't merge yet, I'll add
tests today. Then I'll remove WIP label. Or do you have another
understanding of WIP label?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2911>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANB3MY7EHPH6GF7GPEEX23RABQ7BANCNFSM4KLYT5MQ>
.
|
@kiljacken, @matklad
|
I also need to rebase this branch, I'll do it before the merge, just let me know when it is time. |
let (tokens, errors) = tokenize(text); | ||
assert_errors_are_present(&errors, path); | ||
dump_tokens_and_errors(&tokens, &errors, text) | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
bors r+ |
Bors is stuck, we probably need that rebase. @Veetaha Would you mind rebasing? :) |
Sure, I'll do that later today! |
…ze() (implemented with iterators)
@matklad bors r+ ? |
bors r+ |
2911: Implement collecting errors while tokenizing r=matklad a=Veetaha Now we are collecting errors from `rustc_lexer` and returning them in `ParsedToken { token, error }` and `ParsedTokens { tokens, errors }` structures **([UPD]: this is now simplified, see updates bellow)**. The main changes are introduced in `ra_syntax/parsing/lexer.rs`. It now exposes the following functions and types: ```rust pub fn tokenize(text: &str) -> ParsedTokens; pub fn tokenize_append(text: &str, parsed_tokens_to_append_to: &mut ParsedTokens); pub fn first_token(text: &str) -> Option<ParsedToken>; // allows any number of tokens in text pub fn single_token(text: &str) -> Option<ParsedToken>; // allows only a single token in text pub struct ParsedToken { pub token: Token, pub error: Option<SyntaxError> } pub struct ParsedTokens { pub tokens: Vec<Token>, pub errors: Vec<SyntaxError> } pub enum TokenizeError { /* Simple enum which reflects rustc_lexer tokenization errors */ } ``` In the first commit I implemented it with iterators, but then decided that since this crate is ad hoc for `rust-analyzer` and we clearly see the places of its usage it would be better to simplify it to vectors. This is currently WIP, because I want to add tests for error messages generated by the lexer. I'd like to listen to you thoughts how to define these tests in `ra_syntax/test-data` dir. Related issues: #223 **[UPD]** After the PR review the API was simplified: ```rust pub fn tokenize(text: &str) -> (Vec<Token>, Vec<SyntaxError>); // Both lex functions do not check for unescape errors pub fn lex_single_syntax_kind(text: &str) -> Option<(SyntaxKind, Option<SyntaxError>)>; pub fn lex_single_valid_syntax_kind(text: &str) -> Option<SyntaxKind>; // This will be removed in the next PR in favour of simlifying `SyntaxError` to `(String, TextRange)` pub enum TokenizeError { /* Simple enum which reflects rustc_lexer tokenization errors */ } // this is private, but may be made public if such demand would exist in future (least privilege principle) fn lex_first_token(text: &str) -> Option<(Token, Option<SyntaxError>)>; ``` Co-authored-by: Veetaha <[email protected]>
Build succeeded
|
3026: ra_syntax: reshape SyntaxError for the sake of removing redundancy r=matklad a=Veetaha Followup of #2911, also puts some crosses to the todo list of #223. **AHTUNG!** A big part of the diff of this PR are test data files changes. Simplified `SyntaxError` that was `SyntaxError { kind: { /* big enum */ }, location: Location }` to `SyntaxError(String, TextRange)`. I am not sure whether the tuple struct here is best fit, I am inclined to add names to the fields, because I already provide getters `SyntaxError::message()`, `SyntaxError::range()`. I also removed `Location` altogether ... This is currently WIP, because the following is not done: - [ ] ~~Add tests to `test_data` dir for unescape errors *// I don't know where to put these errors in particular, because they are out of the scope of the lexer and parser. However, I have an idea in mind that we move all validators we have right now to parsing stage, but this is up to discussion...*~~ **[UPD]** I came to a conclusion that tree validation logic, which unescape errors are a part of, should be rethought of, we currently have no tests and no place to put tests for tree validations. So I'd like to extract potential redesign (maybe move of tree validation to ra_parser) and adding tests for this into a separate task. Co-authored-by: Veetaha <[email protected]> Co-authored-by: Veetaha <[email protected]>
Now we are collecting errors from
rustc_lexer
and returning them inParsedToken { token, error }
andParsedTokens { tokens, errors }
structures ([UPD]: this is now simplified, see updates bellow).The main changes are introduced in
ra_syntax/parsing/lexer.rs
. It now exposes the following functions and types:In the first commit I implemented it with iterators, but then decided that since this crate is ad hoc for
rust-analyzer
and we clearly see the places of its usage it would be better to simplify it to vectors.This is currently WIP, because I want to add tests for error messages generated by the lexer.
I'd like to listen to you thoughts how to define these tests in
ra_syntax/test-data
dir.Related issues: #223
[UPD]
After the PR review the API was simplified: