Refactor the injection marker type #12561

the-mikedavis · 2025-01-16T17:13:37Z

This is a rewrite of the LanguageInjectionMarker type with two purposes:

Add a new LanguageId variant that can be used to look up languages by language_id. This is used instead of Name for (#set! injection.language "lang").
Name, Filename and Shebang variants can all use RopeSlice as their inner type and avoid allocations in the common case.

The first change should be the more impactful one: (#set! injection.language "language-id") is far more common than capturing @injection.language, which is used for things like markdown code fences. Previously we treated both as the Name variant meaning that we looked the languages up by the longest matching injection_regex. Instead we can lookup by checking equality with the language_id field which is much cheaper. There's really no need to be flexible in what we accept for the (#set! injection.language ..) version. For me locally this change noticeably reduces editing lag on a large Markdown file like the CHANGELOG.md.

This splits the `InjectionLanguageMarker::Name` into two: one that preforms the previous behavior (using the language configurations' `injection_regex` fields and performing a match) and a new variant that looks up directly by `language_id` with equality. The old variant is used when capturing the injection language like we do in the markdown queries for codefences. That captured text is part of the document being highlighted so we might need a regex to recognize a language like JavaScript as either "js" or "javascript". But the text passed in the `(#set! injection.language "name")` property can be looked up directly. This property is in the query code so there's no need to be flexible in what we accept: we can require that the `(#set! injection.language ..)` properties refer to languages by their configured ID. This should save a noticeable amount of work for the common case of injections: `(#set! injection.language)` is used much more often than `@injection.language`.

The `Name` variant's inner type can be switched to `RopeSlice` since the parent commit removed the usage of `&str`. In doing this we need to switch from a regular `Regex` to a `rope::Regex`, which is mostly a matter of renaming the type. The `Filename` and `Shebang` variants can also switch to `RopeSlice` which avoids allocations in cases where the text doesn't reside on different chunks of the rope. Previously `Filename`'s `Cow` was always the owned variant because of the conversion to a `PathBuf`.

RoloEdits · 2025-01-16T19:15:20Z

Would this close #3072?

David-Else · 2025-01-16T19:29:29Z

For me locally this change noticeably reduces editing lag on a large Markdown file like the CHANGELOG.md

I tried it and had the same result, it feels about twice as fast as before, but still a bit laggy.

Great work!

the-mikedavis · 2025-01-17T00:06:19Z

This should make a nice difference for #3072 but doesn't fully solve it. There's still some lag caused by running the injections.scm query over the file per-edit so we'd need #12546

the-mikedavis added 2 commits January 16, 2025 11:44

the-mikedavis mentioned this pull request Jan 16, 2025

perf(syntax): short-circuit if name matches language_id #12407

Open

the-mikedavis added the A-tree-sitter Area: Tree-sitter label Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor the injection marker type #12561

Refactor the injection marker type #12561

the-mikedavis commented Jan 16, 2025

RoloEdits commented Jan 16, 2025

David-Else commented Jan 16, 2025 •

edited

Loading

the-mikedavis commented Jan 17, 2025

Refactor the injection marker type #12561

Are you sure you want to change the base?

Refactor the injection marker type #12561

Conversation

the-mikedavis commented Jan 16, 2025

RoloEdits commented Jan 16, 2025

David-Else commented Jan 16, 2025 • edited Loading

the-mikedavis commented Jan 17, 2025

David-Else commented Jan 16, 2025 •

edited

Loading