Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rxvm_fsearch is only line-based. Make this configurable #11

Open
eriknyquist opened this issue Oct 18, 2017 · 0 comments
Open

rxvm_fsearch is only line-based. Make this configurable #11

eriknyquist opened this issue Oct 18, 2017 · 0 comments
Assignees
Labels

Comments

@eriknyquist
Copy link
Owner

Currently, the BMH portion of rxvm_fsearch uses the following heuristic to run a BMH search on a fixed substring from an expression;

  • Run BMH string search using fixed substring
  • On match, from the first char. of match, back up to the last newline character
  • Run vm_execute (full regexp matching) from here

Now, this doesn't cause as many issues as it may sound at first-- rxvm_compile, while compiling the expression initially, is keeping track of any fixed substrings potentially suitable for BMH, and there are several things that will get an expression marked as "unsuitable for BMH" (for more details, you can look at the tests for this behaviour, in tests/src/test_rxvm_lfix_heuristic.c). One of those things is a literal newline character in the expression, since if the expression spans multiple lines, I won't know how far to back up...... it's technically possible, if I track the number of newlines in the expression, but I think that can be a future optimisation.

One issue, however, is that you can't really (very successfully) use rxvm_fsearch in a file that contains no newline characters. Or, if it's a huge file with very few newlines then it can be very slow. So I need to either provide a way to completely disable this line-based-BMH-substring method, or figure out some way to do it without newline characters at all. Or, I guess, provide some other alternative, for example, if the user is willing to specify:

1) the maximum size of any possible matching text, or
2)  an alternate, frequently-occurring character, besides newline, that can instead be "backed up to"

Then we could still use BMH effectively, for a full regexp in a file that is not line-based

@eriknyquist eriknyquist self-assigned this Oct 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant