-
-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
review: feat: java lexer for better position detection #5753
base: master
Are you sure you want to change the base?
Conversation
I had a nightmare that @SirYwell was writing a complete lexer. Luckily, this did not happen or? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a good chunk, I hope I didn't miss anything important :) Have fun telling me why the comments are invalid :P
contents, | ||
start, | ||
Math.max(start, end) + 1, //move end after the last char | ||
explicitModifiersByKind, | ||
(modStart, modEnd) -> cf.createSourcePosition(cu, modStart, modEnd, cu.getLineSeparatorPositions()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contents, | |
start, | |
Math.max(start, end) + 1, //move end after the last char | |
explicitModifiersByKind, | |
(modStart, modEnd) -> cf.createSourcePosition(cu, modStart, modEnd, cu.getLineSeparatorPositions()) | |
contents, | |
start, | |
Math.max(start, end) + 1, //move end after the last char | |
explicitModifiersByKind, | |
(modStart, modEnd) -> cf.createSourcePosition(cu, modStart, modEnd, cu.getLineSeparatorPositions()) |
and why is there a Math.max
? Is end
ever smaller than start
? I think this either deserves an explanation or should be end + 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some unholy reason, end
can be -1
sometimes. That can probably be fixed, but I don't want to include that here.
i += 5; | ||
if (this.positionRemap == null) { | ||
this.positionRemap = createPositionRemap(chars); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment stating why this is 6
? And maybe a short comment at the top saying that you are first building a map from index -> skip value to next char and then accumulate it at the bottom or something in that spirit?
if (this.content[i] == '\\') { | ||
if (escape) { | ||
escape = false; | ||
} else if (this.end > i + 1 && this.content[i + 1] == '\\') { | ||
escape = true; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite sure what this is doing. Maybe rename escape
to escapeBackslash
or just handle it inline and increment i
by one in that branch and skip the loop iteration? At present I am not exactly sure what this is doing.
package spoon.support.util.internal.lexer; | ||
|
||
/** | ||
* Valid Java (contextual) keywords |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But VAR
for example is missing, so not all contextual?
readHexadecimalsAndUnderscore(); | ||
} | ||
} else if (peek() == 'b' || peek() == 'B') { | ||
next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next(); | |
// binary literal | |
next(); |
// check if hexadecimal notation | ||
if (first == '0') { | ||
if (peek() == 'x' || peek() == 'X') { | ||
next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next(); | |
// hexadecimal literal | |
next(); |
readDigitsOrUnderscore(); | ||
} | ||
} else { | ||
readDigitsOrUnderscore(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readDigitsOrUnderscore(); | |
// decimal literal | |
readDigitsOrUnderscore(); |
?
return false; | ||
} | ||
|
||
private void readNumericLiteral(char first) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this lex 1e20f
? Can you add a test for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I pressed the wrong button :(
@SirYwell Can you have a look at the remaining comments? |
Fixes #4705.
As mentioned in the linked PR, the current approach to find source locations in spoon is insufficient. The naive approach on only caring about spaces is not enough as there are other separators. By implementing a lexer, we can deal with that properly and also support unicode sequences.
This solution will hopefully - besides correctness - be more future-proof.
I tested the lexer on all
.java
files of the OpenJDK. By manually inspecting failures, it looks like only actual invalid code is rejected. So I'm confident this is a robust implementation.There is more code in the PositionBuilder relying on the whitespace-based methods. It might be possible to replace those usages in future too.