-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
impl Lexer for rope #9
Comments
FWIW, It looks like this would require reimplementing I think however, that I probably may have needed to reimplement that eventually, in order have access to spans for |
@ltratt any chance you could have a look at the branch https://github.com/ratmice/nimbleparse_lsp/tree/issue9 because
Question: besides the above questions, there is a simple question about whether |
I'm not sure I've got enough context to understand the question. It might be relevant to note that the reason for the unwieldy
|
Sorry, these were mostly notes to self until I ran into the need for private API, will try to explain more clearly: Rope is a tree-like data structure for strings, instead of storing a string in a contiguous buffer, it'll store slices of strings in a tree. In that way regarding Some background: So this was aimed at just lexing directly on the
Ahh, I get it now, it doesn't interpret the span as lines, but pads the beginning and end of the returned string to a line boundary. I believe that I totally missed the parenthetical in the docs which is very clear. Apologies. |
Edit removed old comment it doesn't seem relevant anymore: I've kind of given up for the time being on doing this without modifying the lrlex library, e.g. by deriving traits from lrlex from within nimbleparse_lsp. It doesn't seem possible without stablizing a bunch of things that don't want to be stablized inbetween I did though, produce a nice small branch of lrlex with a rope feature. I need to do performance comparisons and sanity testing, but so far it appears to work
So at this point we are better off converting to strings and doing a single copy up front. There are some comments to linking to some bug reports, which has some links to others work on |
So after a working but performance regressing implementation I have what I consider to be 3 options:
|
A few quick thoughts -- these might be irrelevant or obvious but just in case:
|
So I had a thought about this one which is probably worth trying, at least if only to tell us how much improvement lexing ropes might bring without bringing in a bunch of regex changes. If the version of rope we used ended slices on a word boundary, e.g. between a word and the whitespace after it, you should be able to lex over slices because the vast majority of regular expressions would no longer span 2 slices. It probably entails building a rope with this property, but seems a lot simpler than building regex engine which can continue across slices 🤷 Edit2: Incremental regular expressions. Edit: I guess the difficulty here may be finding a good boundary when performing an edit operation, It could be that you end up splitting an edit across slices to enforce the slice boundary property in a way which makes you do an extra split. |
If you want to support whitespace sensitive languages (e.g. Python or Haskell) then this might not work. Parsing is... fun ;) |
Indeed, it is just an optimization in a particular configuration of the lexer regexes, in which case we'd fall back to the full string copy like we currently use. But I don't think it works with whitespace sensitive languages currently, because generally you'll require actions to increment counters. Unfortunately I think c++ style comments probably run afoul of this particular configuration of the regexes. At least if we want both forward and backward scan to work. Anyhow I think it is perhaps more difficult than would be worthwhile to try and detect this dynamically from any given set of regexes. But I do think it would apply to quite a few of languages. |
I've read through the issue again and I think I might understand it slightly better now (but I might be wrong!). I think what you're hitting is a semi-implemented aspect of lrpar. The ultimate aim is that you should be able to parse a file by implementing just So it might be interesting to work out how to do parsing with just the |
Yeah, this issue is going to be tough, because it is really hitting unfinished work in both the regex front and the lexer front, So it's probably worth trying e.g. in an ad-hoc lexer with an ad-hoc regex crate specifically for that purpose, and only then once I've got something working try to think about how we could integrate it into upstream crates like grmtools and regex? |
That does seem like a sensible plan. I will have a look to see if I can easily update lrpar to do "the right thing", but it might exceed my time budget. |
I've had a look into this and, sort of amusingly, |
currently when we go to lex we make a
&str
from the rope, and then use the usualLexer::new
.Especially when working with large files it'd be better to impl
Lexer
or perhaps more appropriately NonStreamingLexer for Rope directly cloning it is cheap.It is likely also that instead of being implemented for Rope, these need to be implemented for Chars or in the case of NonStreamingLexer, using Lines. Generally this just affects what we need to newtype because of orphan rules.
I'm guessing it is actually Chars
I'm not really worried about copies of the lex/yacc files themselves, e.g.
LRNonStreamingLexerDef
and producing that from a rope (orYaccGrammer
either), the traits don't seem to support that anyways.The text was updated successfully, but these errors were encountered: