Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No support for 's' flag #30

Open
conartist6 opened this issue Jan 7, 2021 · 6 comments
Open

No support for 's' flag #30

conartist6 opened this issue Jan 7, 2021 · 6 comments
Labels

Comments

@conartist6
Copy link

The 's' flag should have implications for the behavior of ret, specifically what it outputs for the . expression, but it always outputs ranges as if s flag were not specified.

@fent
Copy link
Owner

fent commented Jan 8, 2021

still not sure how to handle this. should ret output a different token for the . set when the s flag is set? or should flags be handled by what ever uses ret, and handles . differently?

right now, the latter is the case for the i flag, character tokens are represented as the original lowercased or uppercased character. so i'm leaning towards something similar to handle . with s, maybe adding a special type or flag for it.

@fent fent added the feature label Jan 8, 2021
@conartist6
Copy link
Author

I'd argue that ret should handle the flags. Flags are part of the expression. There's no mechanism (in native regex at least) to evaluate an expression with different flags. But you'd have to do something like change value to values for the CHAR type, which would of course be breaking.

I think I'm going to fork ret and do some work on it. There's quite a few improvements that I think are possible. Here are the ones I'm considering:

  • The flags improvements we've just discussed
  • Improve speed by avoiding the use of native regex in parsing.
  • Eliminate possibility that options is not present. Instead offer an array of one option.
  • Change type to be a simple lowercase string.
  • Add the missing support for new language features
  • Captures should store their index. The root expression captures at index 0
  • Combine root and group types into a single expression type
  • General terminology cleanup to more closely match regex docs, e.g. remember should be capture and followedBy is better known as lookahead. Actually lookahead deserves its own type too since engines will have to implement it differently than any other kind of group.

If I do all this work (which would basically amount to a rewrite), would you be interested in merging them under a new major version number or would you prefer that I publish my fork as a separate package?

@conartist6
Copy link
Author

conartist6 commented Jan 8, 2021

If you're interested in my context, I've just finished up the core architecture for my first regex engine: @iter-tools/regex.

@fent
Copy link
Owner

fent commented Jan 8, 2021

If I do all this work (which would basically amount to a rewrite), would you be interested in merging them under a new major version number or would you prefer that I publish my fork as a separate package?

yes! i'd be interesting in merging and working together. i agree on most of those changes, i'll go ahead and comment my thoughts below

  • The flags improvements we've just discussed
    the parser currently takes a regex string, not a regex object which would include flags. another library i use for this, randexp, evaluates CHAR types differently when i is set. a regex engine could work similarly with any flag.
  • Improve speed by avoiding the use of native regex in parsing.

yeah... there was recently a PR that changed part of that code. i made a comment that i regret coding it that way

  • Eliminate possibility that options is not present. Instead offer an array of one option.

yep, i've made this comment before too

  • Change type to be a simple lowercase string.

i suppose this could help with debuggig if someone prints the tokens. i don't think this would be breaking either, assuming people are using ret.types when comparing

  • Add the missing support for new language features

👍

  • Captures should store their index. The root expression captures at index 0

don't understand this one

  • Combine root and group types into a single expression type

i agree that they should both contain one options property, and no stack. not sure about giving ROOT the rest of the properties of GROUP that they don't have

  • General terminology cleanup to more closely match regex docs, e.g. remember should be capture and followedBy is better known as lookahead. Actually lookahead deserves its own type too since engines will have to implement it differently than any other kind of group.

if it matches well known, widely used, official specs then yeah 👍

@conartist6
Copy link
Author

If expressions take over for root and group, it is useful to know what index they are. When you have the index you know where they belong in the array returned by match, which backreference they match, etc. This is super useful for engines like mine, which store captures in a sparse immutable linked list, so the index of the captured content in the list doesn't necessarily equate to its capture index.

@conartist6
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants