Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependencies #329

Conversation

therealbnut
Copy link

@therealbnut therealbnut commented Jul 28, 2023

This PR updates a few of the dependencies, so that upstream fixes can be imported. It also introduces some new Hir variants like lookahead/lookbehind and captures, which could be used to address some existing issues.

@therealbnut therealbnut marked this pull request as ready for review July 28, 2023 05:29
@therealbnut
Copy link
Author

Ah oops, I just noticed there's another PR that does the same - although I seem to have a smaller diff #320 - it may be there's good bits from both we can take

@@ -111,23 +111,38 @@ impl TryFrom<Hir> for Mir {

Ok(Mir::Alternation(alternation))
}
HirKind::Literal(literal) => Ok(Mir::Literal(literal)),
HirKind::Literal(literal) => {
let s = std::str::from_utf8(&*literal.0).unwrap();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, HirKind::Literal is now multiple characters. I'm not sure if it's required to be able to parse as utf8 or not, this may need to use a different approach if it's allowed arbitrary bytes.

@@ -192,7 +194,7 @@ mod tests {
fn priorities() {
let regexes = [
("[a-z]+", 1),
("a|b", 2),
("a|b", 1),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, the HIR representation here changes from Alternate(['a', 'b']) to Class('a'..='b'), which is probably the result of an upstream optimisation - if the test wants it to be an alternate, then it's probably better to do a|c.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in #320, I don't think we just adapt our code to possible optimization. Otherwise, this can be a never ending game where each optimizer in regex-syntax would lead to a breaking change in logos :'-)

Copy link
Author

@therealbnut therealbnut Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, but it seems like the only sensible way forward from that is to fork regex-syntax. Some optimisations will not be invertible.

Alternatively, the priority function could be based on the number of nodes/edges in a regex-syntax derived canonical DFA, which I assume will be stable across optimisations?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum i don’t think optimisations need to be lossy. To me, an optimisation results in the same behavior, which is what the priority should compute

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I clarified what I meant as lossy as “non-invertible”. I mean that optimisations in regex-syntax may optimise in a way that we don’t have enough information to get back to the form we expect.

However we should always be able to get to the same canonical DFA.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let mut chars = s.chars().map(Mir::Literal).peekable();
let c = chars.next().expect("a literal cannot be empty");
if chars.peek().is_some() {
Ok(Mir::Concat(std::iter::once(c).chain(chars).collect()))
Copy link
Author

@therealbnut therealbnut Jul 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this is why my PR differs to #320, and doesn't require changing the class priority. I'm producing multiple Mir::Literal for one HirKind::Literal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be an alternative solution, maybe you should refer to that in #320 directly?

@jeertmans
Copy link
Collaborator

Closing as superseded by #320 and #368.

@jeertmans jeertmans closed this Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants