-
-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update dependencies #329
Update dependencies #329
Conversation
Ah oops, I just noticed there's another PR that does the same - although I seem to have a smaller diff #320 - it may be there's good bits from both we can take |
@@ -111,23 +111,38 @@ impl TryFrom<Hir> for Mir { | |||
|
|||
Ok(Mir::Alternation(alternation)) | |||
} | |||
HirKind::Literal(literal) => Ok(Mir::Literal(literal)), | |||
HirKind::Literal(literal) => { | |||
let s = std::str::from_utf8(&*literal.0).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, HirKind::Literal
is now multiple characters. I'm not sure if it's required to be able to parse as utf8 or not, this may need to use a different approach if it's allowed arbitrary bytes.
@@ -192,7 +194,7 @@ mod tests { | |||
fn priorities() { | |||
let regexes = [ | |||
("[a-z]+", 1), | |||
("a|b", 2), | |||
("a|b", 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, the HIR representation here changes from Alternate(['a', 'b'])
to Class('a'..='b')
, which is probably the result of an upstream optimisation - if the test wants it to be an alternate, then it's probably better to do a|c
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in #320, I don't think we just adapt our code to possible optimization. Otherwise, this can be a never ending game where each optimizer in regex-syntax
would lead to a breaking change in logos
:'-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, but it seems like the only sensible way forward from that is to fork regex-syntax. Some optimisations will not be invertible.
Alternatively, the priority function could be based on the number of nodes/edges in a regex-syntax derived canonical DFA, which I assume will be stable across optimisations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum i don’t think optimisations need to be lossy. To me, an optimisation results in the same behavior, which is what the priority should compute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I clarified what I meant as lossy as “non-invertible”. I mean that optimisations in regex-syntax may optimise in a way that we don’t have enough information to get back to the form we expect.
However we should always be able to get to the same canonical DFA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut chars = s.chars().map(Mir::Literal).peekable(); | ||
let c = chars.next().expect("a literal cannot be empty"); | ||
if chars.peek().is_some() { | ||
Ok(Mir::Concat(std::iter::once(c).chain(chars).collect())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this is why my PR differs to #320, and doesn't require changing the class priority. I'm producing multiple Mir::Literal
for one HirKind::Literal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be an alternative solution, maybe you should refer to that in #320 directly?
This PR updates a few of the dependencies, so that upstream fixes can be imported. It also introduces some new Hir variants like lookahead/lookbehind and captures, which could be used to address some existing issues.