-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change how diagnostics are ordered #4778
base: trunk
Are you sure you want to change the base?
Conversation
toolchain/parse/context.h
Outdated
@@ -396,7 +401,7 @@ class Context { | |||
llvm::raw_ostream* vlog_stream_; | |||
|
|||
// The current position within the token buffer. | |||
Lex::TokenIterator position_; | |||
Lex::TokenIterator* const position_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds extra indirection to a lot of accesses to position_
. Do we have an easy way to benchmark to see if this makes any difference to performance? (I think we could alternatively make Parse
ask the Context
for the address of its position and pass that into the converter, or we could make the Context
own the converter, tracker, and emitter.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could check with compile_benchmark, but not bothering since I think it's clearer to make Context the owner.
@@ -9,7 +9,21 @@ namespace Carbon::Check { | |||
|
|||
auto SemIRDiagnosticConverter::ConvertLoc(SemIRLoc loc, | |||
ContextFnT context_fn) const | |||
-> DiagnosticLoc { | |||
-> std::pair<DiagnosticLoc, int32_t> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered adding the offset as a field in DiagnosticLoc
? I'm not sure whether that'd be reasonable or not, so if this doesn't work out for some reason I think it's fine as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A given Diagnostic can have multiple DiagnosticMessages, each with their own DiagnosticLoc. The DiagnosticLoc will often point at a line in a different file as compared to the file the last_byte_offset points into. I believe putting the last_byte_offset on the Diagnostic avoids the potential for confusion about what a last_byte_offset on a note in a different file means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also wondering the same, since they are created and travel together so much. Might it help if the DiagnosticLoc field was an "ordering key" instead of an explicit "offset" which may be from a different file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a diagnostic message such as:
// CHECK:STDERR: fail_private_field_access.carbon:[[@LINE+7]]:21: error: cannot access private member `radius` of type `Circle` [ClassInvalidMemberAccess]
// CHECK:STDERR: let radius: i32 = circle.radius;
// CHECK:STDERR: ^~~~~~~~~~~~~
The portions provided by DiagnosticLoc are everything that relates to location:
- filename (
fail_private_field_access.carbon
) - line number (
[[@LINE+7]]
) - column number (
21
) - line (
let radius: i32 = circle.radius;
) - length (
^~~~~~~~~~~~~
)
I think I have a different perspective here. It's more than an "ordering key". We have a few functions returning DiagnosticLoc
+ last_byte_offset
, but the code with DiagnosticLoc
only I am not touching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, if there's a strong preference, I can add a wrapper struct instead of using std::pair; but I do think DiagnosticLoc should remain separate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, one more note. DiagnosticLoc
is part of the intended-to-be-public API for diagnostic processors. Per Hyrum's Law, I think we should be cautious about what we put on a public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's fair. I also thought about SortedDiagonsticLoc in place of the pair, or whatever would make sense. A bare int32_t
in a pair is fairly non-descriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a wrapper struct sounds like a nice approach, as it'd give us a field name and a place to put documentation of the value separate from the ConvertLoc
function.
@@ -23,7 +37,7 @@ auto SemIRDiagnosticConverter::ConvertLoc(SemIRLoc loc, | |||
"If we get invalid locations here, we may need to more " | |||
"thoroughly track ImportDecls."); | |||
|
|||
DiagnosticLoc in_import_loc; | |||
std::pair<DiagnosticLoc, int32_t> in_import_loc; | |||
auto import_loc_id = cursor_ir->insts().GetLocId(import_ir.decl_id); | |||
if (import_loc_id.is_node_id()) { | |||
// For imports in the current file, the location is simple. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the calls to ConvertLocInFile
here can produce byte offsets into files other than the current source file. In that case maybe we should set the byte offset to -1 and allow ConvertLoc
to pick a better one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in ConvertLoc
toolchain/lex/tokenized_buffer.h
Outdated
explicit TokenDiagnosticConverter(const TokenizedBuffer* buffer) | ||
: buffer_(buffer) {} | ||
explicit TokenDiagnosticConverter(const TokenizedBuffer* tokens) | ||
: tokens_(tokens) {} | ||
|
||
// Map the given token into a diagnostic location. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also describe the int32_t in the output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing comment to note it's just implementing a virtual function, to avoid duplicating/diverging documentation.
@@ -9,7 +9,21 @@ namespace Carbon::Check { | |||
|
|||
auto SemIRDiagnosticConverter::ConvertLoc(SemIRLoc loc, | |||
ContextFnT context_fn) const | |||
-> DiagnosticLoc { | |||
-> std::pair<DiagnosticLoc, int32_t> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also wondering the same, since they are created and travel together so much. Might it help if the DiagnosticLoc field was an "ordering key" instead of an explicit "offset" which may be from a different file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good. Not merging in case you want to add the wrapper struct for DiagnosticLoc
+ the offset.
This change deliberately breaks away from the line/column ordering, and instead focuses on a last byte offset corresponding to the final token processed as part of producing the message. Where that's equal, this maintains stable ordering in order to reflect the order that diagnostics were produced.
The intent of this approach is that lex, parse, and check diagnostics are interleaved based on where they are produced, but that subexpressions still have diagnostics emitted prior to containing expressions. In particular, the prior line/column sort essentially sorted on the start of where a diagnostic was associated, and this is closer to sorting based on the end. As a consequence, something like
F(1 2)
will have the error for1 2
emitted before a diagnostic forF(1 2)
not matching parameters, instead of after.In check, we track the last handled node. This provides a last_byte_offset separate from where a diagnostic is associated. The intent is that this creates an ordering of diagnostics which may be associated with earlier code, to cause the diagnostics to be emitted later. An example consequence of this is the change in ordering of modifier diagnostics: we are diagnosing those from the same place, but they have the same last_byte_offset, so we print them out in the order produced.
I've added similar tracking to parse, but cannot identify any test which is affected by it (note the separate commit, I thought about this late). I'm not sure whether we have good out-of-order errors we could produce for this.
A significant number of tests have reordered diagnostics as a consequence of this change, so this change does not add further testing.