Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance: use ArrayList instead of ArrayBuffer #286

Closed
wants to merge 8 commits into from

Conversation

mpollmeier
Copy link
Contributor

@mpollmeier mpollmeier commented Jan 6, 2025

Scala's ArrayBuffer suffers from poor performance because Array.copy uses
the slowcopy (element by element) rather than the fast System.arraycopy.
See scala/scala#10962

Profiling showed that the majority of the time is spent in slowcopy.
That part is removed now, and the import time for a large cpg is reduced
from 300s to 280s.
There's likely other places to improve, but this is a large chunk driven
by a rather simple change.

Moving ahead it probably makes sense to replace every ArrayBuffer
with ArrayList... thoughts?

Flamegraph before:
grafik

Flamegraph with this PR:
grafik

@mpollmeier mpollmeier requested a review from bbrehm January 6, 2025 17:54
@bbrehm
Copy link
Contributor

bbrehm commented Jan 7, 2025

I think we should look for easier performance wins first.

This is super ugly, and it looks like scala/scala#10962 will merge soonish.

@mpollmeier
Copy link
Contributor Author

Hmm, the scala PR might merge soonish, but it'll take a while until it's all released.

Can you be more specific re 'super ugly'? IMO what's ugly about this is that we now use both ArrayBuffer and ArrayList in this build, or rather mix scala and java collections which have different APIs (and I prefer the scala collection api). Anything else?

@bbrehm
Copy link
Contributor

bbrehm commented Jan 8, 2025

My main problem is the mixing of scala/java collections, and that this spills all over the place (which makes it super annoying to eventually revert the change).

I would be OK with using reflection to grab the underlying array of the arraybuffer and calling System.arraycopy, though -- that would be our own private isolated faster arraybuffer-to-array function.

Then only one specific place of the code is changed, and can get a nice comment why it is necessary and when it will be healed.

@som-snytt
Copy link

If you're just building an array and then operating on the result, consider using ArrayBuilder; I see ArrayBuilder.ofRef uses java.util.Arrays.copyOf.

@mpollmeier
Copy link
Contributor Author

Ok so now it gets interesting. I didn't use a 'real benchmark' for grownups here, and didn't even bother to run it multiple times when I first created this PR - so sorry...

Anyway, it actually looks like the main speedup for my test case (importing a rather large dataset) doesn't actually derive from the change to use System.arraycopy, but the change to java's ArrayList...

Overview:

  • flatgraph master (using Scala's ArrayBuffer with slowcopy): 300s
  • this branch (using Java's ArrayList with a mix of System.arraycopy and slowcopy): 280s
  • using Scala's ArrayBuffer and grabbing the underlying array via reflection to then use System.arraycopy (https://github.com/joernio/flatgraph/tree/michael/array-speedup-take2): 295s -> this one surprised me, I expected it to be more of an impact

I'll try out two more variants: ArrayBuilder as suggested by @som-snytt (thanks for the pointer!) and using ArrayList with steal-array-via-reflection mode (mostly since I'm curious).

@mpollmeier
Copy link
Contributor Author

Ok, so ArrayBuilder has the same performance as ArrayBuffer, i.e. it does use the fast System.arraycopy, but for this specific use case it doesn't really make a difference. Raw collection performance of java's ArrayList is superior for this one.

Meanwhile @bbrehm made another proposal to improve performance using parallelisation, which I tested both as-is and then applied it on top of this PR, i.e. using ArrayList plus parallelisation.

Results (incl. the ones from above):

@bbrehm
Copy link
Contributor

bbrehm commented Jan 9, 2025

Also try out ShiftLeftSecurity/codepropertygraph#1798

@mpollmeier
Copy link
Contributor Author

Nice one! ShiftLeftSecurity/codepropertygraph#1798 gives us massive gains: we're down to 114s with that one (applied on top of this branch, i.e. with ArrayList and parallelisation).

Kinda makes sense that we're losing time somewhere else, given that the arraycopy part doesn't seem to have as big as an impact as we thought.

@mpollmeier
Copy link
Contributor Author

I'll run the test a few more times to solidify the timing result, and then test with Scala's Arraybuffer, i.e. keep using the nicer api.

@mpollmeier
Copy link
Contributor Author

Updated version of the previous comment:

As expected, it's slightly (not significantly) slower with Scala's ArrayBuffer:

This branch (java ArrayList and parallelisation) plus batching in proto loading ShiftLeftSecurity/codepropertygraph#1798: 114s

Parallelisation (#288) plus ShiftLeftSecurity/codepropertygraph#1798: 117s

Discussion points:

  1. IMO we should definitely merge improve proto conversion batching ShiftLeftSecurity/codepropertygraph#1798, given that is is small and high impact - agreed?
  2. do we want to bring in the parallelisation effort from performance: parallelize diffgraph application #288 ? It's about 3-5% performance improvement (runtime, not less cpu cycles...), but at the cost of far higher complexity, see e.g. performance: parallelize diffgraph application #288 (comment) -> I'm tempted to say 'no' here...
  3. do we want to bring in the change from ArrayBuffer -> ArrayList from this PR? It's about 5-8% performance improvement. I'm tempted to say 'yes': I'll replace all other usages of ArrayBuffer with ArrayList as well, so we don't have this wild mix everywhere. And add a few extension methods that will make it blend in with Scala's collections (or use the existing ones from scala's stdlib).

@bbrehm
Copy link
Contributor

bbrehm commented Jan 10, 2025

I agree on merging (1), that's a no-brainer (and already merged as of now)

I have no strong opinion on merging (2). The parallelization has been planned for from the very beginning, that's why it's such a small patch -- but no need to bring it in if we don't need to.

I'm still very suspicious about merging (3) -- we have a large codebase, including codepropertygraph, joern, the various frontends for joern-cli, the proprietary stuff in codescience & ocular, etc.

Our standard is to use scala.collection.mutable.ArrayBuffer everywhere we can. Changing to java.util.ArrayList is quite invasive, we only want to do that if it's really required, or if the code is really hot! This really opens pandoras box...

@mpollmeier
Copy link
Contributor Author

Ok then, let's stay with ArrayBuffer. It's not worth opening any boxes (or cans fwiw) given the observed 5% performance improvement for this use case. Other points for this take:

  1. we don't really understand why ArrayList is faster - it's supposedly doing the same thing for us. Maybe it's growth function suits this specific use case better?
  2. this is only one of many use cases to consider - we'd need a more complete and reproducable suite of benchmarks to make a really informed decision.

@mpollmeier mpollmeier closed this Jan 10, 2025
@mpollmeier
Copy link
Contributor Author

scala/scala#10981
🎉 thanks @bbrehm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants