Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Improve cucumber expression creation performance #202

Merged
merged 15 commits into from
Jan 15, 2023

Conversation

jkronegg
Copy link
Contributor

@jkronegg jkronegg commented Jan 5, 2023

🤔 What's changed?

Improved CucumberExpression.escapeRegex(String) performance as proposed in #200. Calls to escapeRegex are about 7 times faster and the overall performance improvement on CucumberExpression new instance is about 7%.

⚡️ What's your motivation?

Fixes #200

🏷️ What kind of change is this?

  • 🏦 Refactoring/debt/DX (improvement to code design, tooling, documentation etc. without changing behaviour)

♻️ Anything particular you want feedback on?

Unit tests have been executed on the previous implementation and proposed implementation to prevent regressions.

📋 Checklist:

  • I agree to respect and uphold the Cucumber Community Code of Conduct
  • I've changed the behaviour of the code
    • I have added/updated tests to cover my changes.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.
  • Users should know about my change
    • I have added an entry to the "Unreleased" section of the CHANGELOG, linking to this pull request.

@jkronegg jkronegg requested a review from mpkorstanje January 5, 2023 06:43
Replace issue link by PR link.
@jkronegg
Copy link
Contributor Author

Hi @mpkorstanje, is anything missing in the PR ? Should I provide more information about the algorithm ?

@mpkorstanje
Copy link
Contributor

No I'm just backlogged for a bit.

int length = text.length();
StringBuilder sb = new StringBuilder(length * 2); // prevent resizes
int maxChar = CHAR_TO_ESCAPE.length;
for (int i = 0; i < length; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in practice expressions with characters that need to be escaped are rare. Would it make sense to check the entire string for characters that need to be replaced first? It would save an array copy.

Copy link
Contributor Author

@jkronegg jkronegg Jan 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we rarely escape in practice, the situation is a bit different. We need to benchmark the variants based on the escaping frequency, not only on pure escaping performance.

I've implemented two other variants, so to summarize:

Variant Description
escapeRegex0 Pattern (cucumber 7.11.0)
escapeRegex1 append(char) with 1 array allocation and 1 array copy
escapeRegex2 append(String) with 1 array allocation and 1 array copy
escapeRegex3 append(char) with 1 array allocation and 0..1 array copy
escapeRegex4 append(String) with 0..1 array allocation and 0..1 array copy

I benchmarked these variants with JMH for varying escaping frequencies:

Cucumber_escapeRegex_performance

The variant escapeRegex4 outperforms the other variants for all escaping frequencies. Depending on escaping frequency, this variant is 1.2 to 3 times faster the cucumber 7.11.0 implementation. The code is a bit more complex, but still understandable, so I think we can use this variant:

public static String escapeRegex4(String text) {
    int length = text.length();
    StringBuilder sb = null;
    int blocStart=0;
    int maxChar = CHAR_TO_ESCAPE.length;
    for (int i = 0; i < length; i++) {
        char currentChar = text.charAt(i);
        if (currentChar < maxChar && CHAR_TO_ESCAPE[currentChar]) {
            if (sb == null) {
                sb = new StringBuilder(length * 2);
            }
            if (i > blocStart) {
                // flush previous block
                sb.append(text, blocStart, i);
            }
            sb.append('\\');
            sb.append(currentChar);
            blocStart=i+1;
        }
    }
    if (sb != null) {
        if (length > blocStart) {
            // flush remaining characters
            sb.append(text, blocStart, length);
        }
        return sb.toString();
    }
    return text;
}

In real conditions, on my project with about ~400 test scenarios runs in about 8.5 seconds. The InteliJ profiler says escapeRegex0 takes 3.37% of the total CPU time (=280 ms), while escapeRegex4 takes 0.97% of the total (=80 ms). That's a 200 ms improvement (2.3%): not a lot on one build (I was hoping for more), but worth the effort when considering the number of builds on all cucumber users.

For completeness, the escapeRegex3 variant is:

public static String escapeRegex3(String text) {
    int length = text.length();
    StringBuilder sb = new StringBuilder(length * 2); // prevent resizes
    int maxChar = CHAR_TO_ESCAPE.length;
    boolean escaped = false;
    for (int i = 0; i < length; i++) {
        char currentChar = text.charAt(i);
        if (currentChar < maxChar && CHAR_TO_ESCAPE[currentChar]) {
            sb.append('\\');
            escaped = true;
        }
        sb.append(currentChar);
    }
    return escaped ? sb.toString() : text;
}

@jkronegg jkronegg requested a review from mpkorstanje January 14, 2023 10:09
Copy link
Contributor

@mpkorstanje mpkorstanje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some nitpicks!

@mpkorstanje
Copy link
Contributor

@jkronegg would you be interested in helping out with cucumber/cucumber-jvm#2542. It will eventually enable us to discover all step definitions once, and because of that build all cucumber expressions once (per language) which should significantly improve performance. Though I can imagine that a larger contribution with a longer commitment isn't quite what you are looking for, so if you'd decline that is perfectly understandable.

Copy link
Contributor

@mpkorstanje mpkorstanje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge and release this soonish.

@mpkorstanje mpkorstanje changed the title CucumberExpression.escapeRegex performance improvement [Java] Improve cucumber expression creation performance Jan 15, 2023
@mpkorstanje mpkorstanje merged commit 3c080c5 into main Jan 15, 2023
@mpkorstanje mpkorstanje deleted the jkronegg-patch-200 branch January 15, 2023 19:51
@jkronegg
Copy link
Contributor Author

jkronegg commented Feb 6, 2023

@jkronegg would you be interested in helping out with cucumber/cucumber-jvm#2542. It will eventually enable us to discover all step definitions once, and because of that build all cucumber expressions once (per language) which should significantly improve performance. Though I can imagine that a larger contribution with a longer commitment isn't quite what you are looking for, so if you'd decline that is perfectly understandable.

@mpkorstanje sorry for the long response delay. My employer (https://www.mobiliere.ch/) allows me to work up to 10% of my labor time for the Cucumber framework. I still need to manage the details, but I will be happy to help the Cucumber community.

@mpkorstanje
Copy link
Contributor

Cheers. I'll see if I can write something up to help you get started. I'm unfortunately a bit short on time at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bad CucumberExpression creation performance
2 participants