Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex does not support unicode category "Currency Symbol" #359

Closed
wzzzzd opened this issue Jan 9, 2024 · 4 comments
Closed

Regex does not support unicode category "Currency Symbol" #359

wzzzzd opened this issue Jan 9, 2024 · 4 comments

Comments

@wzzzzd
Copy link

wzzzzd commented Jan 9, 2024

I'm writing a lexer for Cypher language. As specified in its ANTLR grammar, I define the token for unescaped identifiers as follows:

#[Logos, Debug]
pub enum Token {
    #[regex(r"[\p{ID_Start}\p{Pc}][\p{ID_Continue}\p{Sc}]*")]
    UnescapedSymbolicName,
    
    ...
}

It fails to compile due to regex parse error:

error: regex parse error:
           [\p{ID_Start}\p{Pc}][\p{ID_Continue}\p{Sc}]*
                                               ^^^^^^
       error: Unicode property not found
   --> src/token.rs:236:13
    |
236 |     #[regex(r"[\p{ID_Start}\p{Pc}][\p{ID_Continue}\p{Sc}]*")]
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It seems the problem lies in the dependency regex-syntax = 0.6. I try to parse [\p{Sc}] with it:

#[test]
fn test() {
    let mut parser = regex_syntax::Parser::new();
    let hir = parser.parse(r"\p{Sc}");
    println!("{:?}", hir);
}

And It raises the same error

Err(Translate(Error { kind: UnicodePropertyNotFound, pattern: "\\p{Sc}", span: Span(Position(o: 0, l: 1, c: 1), Position(o: 6, l: 1, c: 7)) }))

The problem might be solved by updating regex-syntax. For example, with regex-syntax = 0.8.2, the test above succeeds:

Ok(Class({'$'..='$', '¢'..='¥', '֏'..='֏', '؋'..='؋', '߾'..='߿', '৲'..='৳', '৻'..='৻', '૱'..='૱', '௹'..='௹', '฿'..='฿', '៛'..='៛', '₠'..='⃀', '꠸'..='꠸', '﷼'..='﷼', '﹩'..='﹩', '$'..='$', '¢'..='£', '¥'..='₩', '𑿝'..='𑿠', '𞋿'..='𞋿', '𞲰'..='𞲰'}))
@jeertmans
Copy link
Collaborator

Hello, #320 already solved this issue by upgrading the regex-syntax dependency. The PR is stalled because I am waiting some reply from the author of this crate, as I cannot publish new releases on my own :/

@wzzzzd
Copy link
Author

wzzzzd commented Jan 10, 2024

Hello, #320 already solved this issue by upgrading the regex-syntax dependency. The PR is stalled because I am waiting some reply from the author of this crate, as I cannot publish new releases on my own :/

Nice to hear that. Hope the PR can be merged soon :)

@jeertmans
Copy link
Collaborator

@wzzzzd the issue is not the PR being merged, because I can do that anytime I want. But I cannot publish a new release :/

@jeertmans
Copy link
Collaborator

Fixed with 0.14.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants