Change how diagnostics are ordered #4778

jonmeow · 2025-01-09T00:06:20Z

This change deliberately breaks away from the line/column ordering, and instead focuses on a last byte offset corresponding to the final token processed as part of producing the message. Where that's equal, this maintains stable ordering in order to reflect the order that diagnostics were produced.

The intent of this approach is that lex, parse, and check diagnostics are interleaved based on where they are produced, but that subexpressions still have diagnostics emitted prior to containing expressions. In particular, the prior line/column sort essentially sorted on the start of where a diagnostic was associated, and this is closer to sorting based on the end. As a consequence, something like F(1 2) will have the error for 1 2 emitted before a diagnostic for F(1 2) not matching parameters, instead of after.

In check, we track the last handled node. This provides a last_byte_offset separate from where a diagnostic is associated. The intent is that this creates an ordering of diagnostics which may be associated with earlier code, to cause the diagnostics to be emitted later. An example consequence of this is the change in ordering of modifier diagnostics: we are diagnosing those from the same place, but they have the same last_byte_offset, so we print them out in the order produced.

I've added similar tracking to parse, but cannot identify any test which is affected by it (note the separate commit, I thought about this late). I'm not sure whether we have good out-of-order errors we could produce for this.

A significant number of tests have reordered diagnostics as a consequence of this change, so this change does not add further testing.

zygoloid · 2025-01-09T01:24:14Z

toolchain/parse/context.h

@@ -396,7 +401,7 @@ class Context {
  llvm::raw_ostream* vlog_stream_;

  // The current position within the token buffer.
-  Lex::TokenIterator position_;
+  Lex::TokenIterator* const position_;


This adds extra indirection to a lot of accesses to position_. Do we have an easy way to benchmark to see if this makes any difference to performance? (I think we could alternatively make Parse ask the Context for the address of its position and pass that into the converter, or we could make the Context own the converter, tracker, and emitter.)

We could check with compile_benchmark, but not bothering since I think it's clearer to make Context the owner.

zygoloid · 2025-01-09T02:05:12Z

toolchain/check/sem_ir_diagnostic_converter.cpp

@@ -9,7 +9,21 @@ namespace Carbon::Check {

 auto SemIRDiagnosticConverter::ConvertLoc(SemIRLoc loc,
                                          ContextFnT context_fn) const
-    -> DiagnosticLoc {
+    -> std::pair<DiagnosticLoc, int32_t> {


Have you considered adding the offset as a field in DiagnosticLoc? I'm not sure whether that'd be reasonable or not, so if this doesn't work out for some reason I think it's fine as-is.

A given Diagnostic can have multiple DiagnosticMessages, each with their own DiagnosticLoc. The DiagnosticLoc will often point at a line in a different file as compared to the file the last_byte_offset points into. I believe putting the last_byte_offset on the Diagnostic avoids the potential for confusion about what a last_byte_offset on a note in a different file means.

I was also wondering the same, since they are created and travel together so much. Might it help if the DiagnosticLoc field was an "ordering key" instead of an explicit "offset" which may be from a different file?

In a diagnostic message such as:

// CHECK:STDERR: fail_private_field_access.carbon:[[@LINE+7]]:21: error: cannot access private member `radius` of type `Circle` [ClassInvalidMemberAccess] // CHECK:STDERR: let radius: i32 = circle.radius; // CHECK:STDERR: ^~~~~~~~~~~~~

The portions provided by DiagnosticLoc are everything that relates to location:

filename (fail_private_field_access.carbon)

line number ([[@LINE+7]])

column number (21)

line (let radius: i32 = circle.radius;)

length (^~~~~~~~~~~~~)

I think I have a different perspective here. It's more than an "ordering key". We have a few functions returning DiagnosticLoc + last_byte_offset, but the code with DiagnosticLoc only I am not touching.

Note, if there's a strong preference, I can add a wrapper struct instead of using std::pair; but I do think DiagnosticLoc should remain separate

Sorry, one more note. DiagnosticLoc is part of the intended-to-be-public API for diagnostic processors. Per Hyrum's Law, I think we should be cautious about what we put on a public API.

Yeah that's fair. I also thought about SortedDiagonsticLoc in place of the pair, or whatever would make sense. A bare int32_t in a pair is fairly non-descriptive.

Adding a wrapper struct sounds like a nice approach, as it'd give us a field name and a place to put documentation of the value separate from the ConvertLoc function.

zygoloid · 2025-01-09T02:05:15Z

toolchain/check/sem_ir_diagnostic_converter.cpp

@@ -23,7 +37,7 @@ auto SemIRDiagnosticConverter::ConvertLoc(SemIRLoc loc,
                 "If we get invalid locations here, we may need to more "
                 "thoroughly track ImportDecls.");

-    DiagnosticLoc in_import_loc;
+    std::pair<DiagnosticLoc, int32_t> in_import_loc;
    auto import_loc_id = cursor_ir->insts().GetLocId(import_ir.decl_id);
    if (import_loc_id.is_node_id()) {
      // For imports in the current file, the location is simple.


It looks like the calls to ConvertLocInFile here can produce byte offsets into files other than the current source file. In that case maybe we should set the byte offset to -1 and allow ConvertLoc to pick a better one?

Done in ConvertLoc

danakj · 2025-01-09T16:34:18Z

toolchain/lex/tokenized_buffer.h

-  explicit TokenDiagnosticConverter(const TokenizedBuffer* buffer)
-      : buffer_(buffer) {}
+  explicit TokenDiagnosticConverter(const TokenizedBuffer* tokens)
+      : tokens_(tokens) {}

  // Map the given token into a diagnostic location.


This could also describe the int32_t in the output?

Changing comment to note it's just implementing a virtual function, to avoid duplicating/diverging documentation.

danakj · 2025-01-09T16:47:51Z

toolchain/check/sem_ir_diagnostic_converter.cpp

@@ -9,7 +9,21 @@ namespace Carbon::Check {

 auto SemIRDiagnosticConverter::ConvertLoc(SemIRLoc loc,
                                          ContextFnT context_fn) const
-    -> DiagnosticLoc {
+    -> std::pair<DiagnosticLoc, int32_t> {


I was also wondering the same, since they are created and travel together so much. Might it help if the DiagnosticLoc field was an "ordering key" instead of an explicit "offset" which may be from a different file?

zygoloid

Thanks, looks good. Not merging in case you want to add the wrapper struct for DiagnosticLoc + the offset.

jonmeow added 2 commits January 8, 2025 15:23

Change how diagnostics are ordered

7e1b1f6

Track location in parse

8b5ef96

github-actions bot requested a review from zygoloid January 9, 2025 00:06

github-actions bot added the toolchain label Jan 9, 2025

zygoloid reviewed Jan 9, 2025

View reviewed changes

danakj reviewed Jan 9, 2025

View reviewed changes

jonmeow added 3 commits January 9, 2025 09:40

comments

4e36dda

owner

b9b6437

crossfile

d2b69e7

zygoloid approved these changes Jan 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change how diagnostics are ordered #4778

Change how diagnostics are ordered #4778

jonmeow commented Jan 9, 2025 •

edited

Loading

zygoloid Jan 9, 2025

jonmeow Jan 9, 2025

zygoloid Jan 9, 2025

jonmeow Jan 9, 2025

danakj Jan 9, 2025

jonmeow Jan 9, 2025 •

edited

Loading

jonmeow Jan 9, 2025

jonmeow Jan 9, 2025

danakj Jan 9, 2025

zygoloid Jan 10, 2025

zygoloid Jan 9, 2025

jonmeow Jan 9, 2025

danakj Jan 9, 2025

jonmeow Jan 9, 2025

danakj Jan 9, 2025

zygoloid left a comment •

edited

Loading

Change how diagnostics are ordered #4778

Are you sure you want to change the base?

Change how diagnostics are ordered #4778

Conversation

jonmeow commented Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonmeow Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zygoloid left a comment • edited Loading

Choose a reason for hiding this comment

jonmeow commented Jan 9, 2025 •

edited

Loading

jonmeow Jan 9, 2025 •

edited

Loading

zygoloid left a comment •

edited

Loading