Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the injection marker type #12561

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

the-mikedavis
Copy link
Member

This is a rewrite of the LanguageInjectionMarker type with two purposes:

  • Add a new LanguageId variant that can be used to look up languages by language_id. This is used instead of Name for (#set! injection.language "lang").
  • Name, Filename and Shebang variants can all use RopeSlice as their inner type and avoid allocations in the common case.

The first change should be the more impactful one: (#set! injection.language "language-id") is far more common than capturing @injection.language, which is used for things like markdown code fences. Previously we treated both as the Name variant meaning that we looked the languages up by the longest matching injection_regex. Instead we can lookup by checking equality with the language_id field which is much cheaper. There's really no need to be flexible in what we accept for the (#set! injection.language ..) version. For me locally this change noticeably reduces editing lag on a large Markdown file like the CHANGELOG.md.

This splits the `InjectionLanguageMarker::Name` into two: one that
preforms the previous behavior (using the language configurations'
`injection_regex` fields and performing a match) and a new variant that
looks up directly by `language_id` with equality.

The old variant is used when capturing the injection language like we
do in the markdown queries for codefences. That captured text is part of
the document being highlighted so we might need a regex to recognize a
language like JavaScript as either "js" or "javascript". But the text
passed in the `(#set! injection.language "name")` property can be
looked up directly. This property is in the query code so there's no
need to be flexible in what we accept: we can require that the
`(#set! injection.language ..)` properties refer to languages by their
configured ID. This should save a noticeable amount of work for the
common case of injections: `(#set! injection.language)` is used much
more often than `@injection.language`.
The `Name` variant's inner type can be switched to `RopeSlice` since
the parent commit removed the usage of `&str`. In doing this we need to
switch from a regular `Regex` to a `rope::Regex`, which is mostly a
matter of renaming the type.

The `Filename` and `Shebang` variants can also switch to `RopeSlice`
which avoids allocations in cases where the text doesn't reside on
different chunks of the rope. Previously `Filename`'s `Cow` was always
the owned variant because of the conversion to a `PathBuf`.
@RoloEdits
Copy link
Contributor

Would this close #3072?

@David-Else
Copy link
Contributor

David-Else commented Jan 16, 2025

For me locally this change noticeably reduces editing lag on a large Markdown file like the CHANGELOG.md

I tried it and had the same result, it feels about twice as fast as before, but still a bit laggy.

Great work!

@the-mikedavis
Copy link
Member Author

This should make a nice difference for #3072 but doesn't fully solve it. There's still some lag caused by running the injections.scm query over the file per-edit so we'd need #12546

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tree-sitter Area: Tree-sitter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants