-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing hashing performance #136
Comments
I spent a few minutes trying to investigate this and here are some numbers. Bun
Node.js 16.20.1
Node.js 18.18.1
Node.js 20.9.0
Node.js 21.1.0
About bun, the Apparently, Instead of EVP_* functions (code), I updated the benchmark to: const crypto = require('node:crypto');
const value = '/Users/fabio/path/to/some/random/file.js';
const hash = crypto.createHash("sha1");
const start = performance.now();
hash.update(value)
console.log(`End: ${performance.now() - start}`); Bun: 0.0250780000000006ms So, even calling only |
But is is even recommendable to use non EVP Calls? |
Btw. It is crazy how slow it is to sha256 hmac a string. It is something i cant optimize in probot and results in about 6k req/s performance of a probot server. |
I don't have any idea, in the documentation page, they deprecated some overloads of So, I think is safe to use.
Once I cached all the string and then performed only one |
There are various known performance issues related to OpenSSL 3. It's possible that they negatively affect performance of hashing through
The FWIW, I proposed |
Are we using OpenSSL 3.0 or 3.1? |
@ronag We are basically cut off from direct OpenSSL updates. |
So basically we are stuck with this until we update OpenSSL. |
Anyone considered switching to BoringSSL? |
I guess a PR to switch to BoringSSL would result in a lot of negative feedback. Also BoringSSL does not guarantee stable ABI. https://boringssl.googlesource.com/boringssl/+/HEAD/PORTING.md#porting-from-openssl-to-boringssl
E.g. @mcollina mentioned that stable ABI is important for him. So I assume for other it is also important. |
Is this a theoretical issue or a practical one? Do they actually break the ABI that often? I think quic support is better in BoringSSL so at the end of the day it might be easier to use? |
https://boringssl.googlesource.com/boringssl/+/HEAD/README.md
|
BoringSSL also lacks various features that users of Node.js might rely on and that'd we'd have to polyfill somehow, e.g., certain algorithms. |
@nodejs/performance @nodejs/crypto |
At a glance, this does not look like a hashing performance issue. |
Try the following... Start with this script... (call it import crypto from 'node:crypto';
const value = '/Users/fabio/path/to/some/random/file.js';
console.time ( 'crypto' );
for ( let i = 0; i < 1000_000; i++ ) {
crypto.createHash("sha1").update(value).digest("hex");
}
console.timeEnd ( 'crypto' ); (I picked the script as-is, but I extended to loop to 1000_000 repetitions.) The conjecture is that the hashing is slow, and by that, presumably, we mean the import crypto from 'node:crypto';
console.time ( 'crypto' );
for ( let i = 0; i < 1000_000; i++ ) {
crypto.createHash("sha1");
}
console.timeEnd ( 'crypto' ); If hashing done by update is the bottleneck, this second script should be very fast compared to the previous one. Let us see... I use Node.js v20.10.0.
Yep. It is not hashing. Only 0.4 s are spent on hashing. The bulk of the time is spent on Let us try with Bun now...
So bun spends 0.64 s on hashing... roughly as much as Node.js. The big difference between Node and Bun is that Bun is 10 times faster on the So let us try something else... let us call import crypto from 'node:crypto';
const value = '/Users/fabio/path/to/some/random/file.js';
console.time ( 'crypto' );
const hasher = crypto.createHash("sha1");
for ( let i = 0; i < 1000_000; i++ ) {
hasher.update(value);
}
console.timeEnd ( 'crypto' ); I get...
Yep. So we have narrowed it down to So let us profile the
So we are spending time allocating memory and doing some threading. These are expensive jobs... And now Bun...
My conclusion is that though the problem might come from OpenSSL, and hashing... it does not appear to be so. As you can see, just Maybe we should get the advice of someone who understands deeply the C++ architecture: @joyeecheung Note: I have not disproved that Bun could have an advantage when hashing. It probably does. I am just saying that on this particular benchmark, the processor I used (Linux+x64), with a small string, hashing is not the bottleneck. Node.js has lost the race before it has even begun. |
I did some profiling to see where the bottleneck is - it seems #136 (comment) was based on only self-time, but it's still important to look at the call graph.
const { createHash } = require('node:crypto');
const array = [];
for ( let i = 0; i < 1000_000; i++ ) {
array.push(null);
}
const start_time = performance.now();
for ( let i = 0; i < 1000_000; i++ ) {
array[i] = createHash("sha1");
}
console.log(performance.now() - start_time);
console.log(array.length); On main the profile of the snippet above looks like this (simplified with only the important bits, number is total time in the process being profiled, the % outside these are mostly setup/teardown/GC stuff, or just samples with a slightly different stack)
With the Oilpan prototype in nodejs/node#51017 I get a somewhat different profile that looks like this
With a different allocation scheme it sheds off a chunk of the overhead though there are still a lot to be improved:
I think another thing that can be done is caching the |
Actually regarding the OpenSSL overhead the solution (at least a partial solution) seems surprisingly simple - we are just under-initializing OpenSSL: nodejs/node#51019 |
Actually #136 (comment) was incorrect, I was just looking at the numbers from the Oilpan prototype, initializing it with On another note, it may be a good idea to just move away from
|
Doesnt the OpenSSL docs say something about avoiding the lookup functions like EVP_get_digestbyname? I think I read something about this. Something along the lines that OpenSSL 3 is totally object oriented and thus e.g. you register digests in an Object at runtime. So calling something like EVP_get_digestbyname means basically, that OpenSSL goes through each name and checks if it finds the proper digest. |
Yes, though I think we'll need a bit of internal refactoring to move on to explicit fetching because so far most of the crypto classes don't think about freeing the EVP_MD (there is also a somewhat weird (undocumented?) signature that allows the algorithm to be another Hash object, I am not sure what should happen in terms of life cycle management in that case) |
About the InstanceOf - it seems mostly coming from the On a side note, I see that there are still a lot of |
I had another "surprisingly simple" (hopefully not just being naive) idea of migrating to EVP_MD_fetch and caching the EVP_MD. With nodejs/node#51034 nodejs/node#51026 and nodejs/node#51017 I am getting this from the createHash() microbenchmark pasted above:
At this point the most significant bottleneck is garbage collection (specifically scavenge) from With a microbenchmark like this: 'use strict';
const { createHash } = require('node:crypto');
const { readFileSync } = require('node:fs');
const kHashCount = parseInt(process.env.HASH_COUNT) || 10000;
const kInputLength = parseInt(process.env.INPUT_LENGTH) || 100;
const kUpdateCount = parseInt(process.env.UPDATE_COUNT) || 1;
const array = [];
for ( let i = 0; i < kHashCount; i++ ) {
array.push(null);
}
const input = readFileSync('./test/fixtures/snapshot/typescript.js').slice(0, kInputLength);
const start_time = performance.now();
for ( let i = 0; i < kHashCount; i++ ) {
const hash = createHash('sha1');
for (let j = 0; j < kUpdateCount; j++) {
hash.update(input);
}
array[i] = hash.digest('hex');
}
console.log(performance.now() - start_time);
console.log(`${kHashCount} hashes, ${kUpdateCount} updates, input length ${input.length}, ${array.length} results`); Using a buffer as input, there are only performance differences between my local branch, the main branch, and bun when we create a lot of hashes - the bottleneck is the performance of createHash() as well as GC. When hashing a big chunk of input at one go, or hashing fewer streamed bigger input, they perform similarly.
And if we add a
|
As I understand it V8 keeps track of whether a string is encoded in latin1 or not, because if it's using latin1 it can just allocate 1 byte per character rather than 2. In that case, for latin1 strings, if the rope is flattened and whatever else may be needed, shouldn't decoding be essentially free because the slice of memory that the string points to can just be hashed directly? Maybe it's more complicated than that 🤔 |
V8 does not hand over the raw bytes to embedders, only a copy. Also it's only a pure copy for ASCII-only strings. For strings that are Latin1 but not ASCII there's still a bit of conversion to split each code point into two bytes. |
What I am saying is that I do agree with you, btw, that it makes a lot of sense to do a one-shot hashing... But what I am also saying is that |
Yeah, there could be some micro-optimization that only starts a object-based implementation upon second update / when the first update contains a big enough input (say >1MB). Otherwise fallback to a one-shot digest implementation. Though the prerequisite would still be having a one-shot digest implementation... |
PR-URL: #51026 Refs: nodejs/performance#136 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
Precisely. ❤️ ❤️ The idea is that if you propose a new function, not everyone will rush to adopt it. But it might be possible to provide similar benefits to people who use the conventional API. 💨 |
PR-URL: #51026 Refs: nodejs/performance#136 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
@joyeecheung If you want a real world example where Setup: git clone https://github.com/yarnpkg/berry
cd berry
git checkout db6210f48355d2986e965f90009b22f18d3b6342
yarn build:cli --no-minify Running the following command will call rm -f .yarn/install-state.gz
YARN_IGNORE_PATH=1 CI=true node ./packages/yarnpkg-cli/bundles/yarn.js and the following will, at the time of writing, call it 90340 times:
about 1853 of those should be hashing streamed files and the rest strings. |
On OpenSSL 3, migrate from EVP_get_digestbyname() to EVP_MD_fetch() to get the implementation and use a per-Environment cache for it. The EVP_MDs are freed during Environment cleanup. Drive-by: declare the smart pointer for EVP_MD_CTX as EVPMDCtxPointer instead of EVPMDPointer to avoid confusion with EVP_MD pointers. PR-URL: #51034 Refs: https://www.openssl.org/docs/man3.0/man7/crypto.html#Explicit-fetching Refs: nodejs/performance#136 Reviewed-By: James M Snell <[email protected]>
On OpenSSL 3, migrate from EVP_get_digestbyname() to EVP_MD_fetch() to get the implementation and use a per-Environment cache for it. The EVP_MDs are freed during Environment cleanup. Drive-by: declare the smart pointer for EVP_MD_CTX as EVPMDCtxPointer instead of EVPMDPointer to avoid confusion with EVP_MD pointers. PR-URL: nodejs#51034 Refs: https://www.openssl.org/docs/man3.0/man7/crypto.html#Explicit-fetching Refs: nodejs/performance#136 Reviewed-By: James M Snell <[email protected]>
On OpenSSL 3, migrate from EVP_get_digestbyname() to EVP_MD_fetch() to get the implementation and use a per-Environment cache for it. The EVP_MDs are freed during Environment cleanup. Drive-by: declare the smart pointer for EVP_MD_CTX as EVPMDCtxPointer instead of EVPMDPointer to avoid confusion with EVP_MD pointers. PR-URL: nodejs#51034 Refs: https://www.openssl.org/docs/man3.0/man7/crypto.html#Explicit-fetching Refs: nodejs/performance#136 Reviewed-By: James M Snell <[email protected]>
On OpenSSL 3, migrate from EVP_get_digestbyname() to EVP_MD_fetch() to get the implementation and use a per-Environment cache for it. The EVP_MDs are freed during Environment cleanup. Drive-by: declare the smart pointer for EVP_MD_CTX as EVPMDCtxPointer instead of EVPMDPointer to avoid confusion with EVP_MD pointers. PR-URL: #51034 Refs: https://www.openssl.org/docs/man3.0/man7/crypto.html#Explicit-fetching Refs: nodejs/performance#136 Reviewed-By: James M Snell <[email protected]>
I looked into the "buffering input" idea for |
PR-URL: #51044 Refs: nodejs/performance#136 Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
This patch introduces a helper crypto.hash() that computes a digest from the input at one shot. This can be 1.2-1.6x faster than the object-based createHash() for smaller inputs (<= 5MB) that are readily available (not streamed) and incur less memory overhead since no intermediate objects will be created. PR-URL: #51044 Refs: nodejs/performance#136 Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
Since nodejs/node#51044 was merged, I think we can close this issue. Any other discussion about OpenSSL and its perf, I think we should keep at #72 |
PR-URL: #51044 Refs: nodejs/performance#136 Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
This patch introduces a helper crypto.hash() that computes a digest from the input at one shot. This can be 1.2-1.6x faster than the object-based createHash() for smaller inputs (<= 5MB) that are readily available (not streamed) and incur less memory overhead since no intermediate objects will be created. PR-URL: #51044 Refs: nodejs/performance#136 Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
PR-URL: #51044 Refs: nodejs/performance#136 Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
This patch introduces a helper crypto.hash() that computes a digest from the input at one shot. This can be 1.2-1.6x faster than the object-based createHash() for smaller inputs (<= 5MB) that are readily available (not streamed) and incur less memory overhead since no intermediate objects will be created. PR-URL: #51044 Refs: nodejs/performance#136 Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
PR-URL: #51026 Refs: nodejs/performance#136 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
On OpenSSL 3, migrate from EVP_get_digestbyname() to EVP_MD_fetch() to get the implementation and use a per-Environment cache for it. The EVP_MDs are freed during Environment cleanup. Drive-by: declare the smart pointer for EVP_MD_CTX as EVPMDCtxPointer instead of EVPMDPointer to avoid confusion with EVP_MD pointers. PR-URL: #51034 Refs: https://www.openssl.org/docs/man3.0/man7/crypto.html#Explicit-fetching Refs: nodejs/performance#136 Reviewed-By: James M Snell <[email protected]>
PR-URL: #51044 Refs: nodejs/performance#136 Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
This patch introduces a helper crypto.hash() that computes a digest from the input at one shot. This can be 1.2-1.6x faster than the object-based createHash() for smaller inputs (<= 5MB) that are readily available (not streamed) and incur less memory overhead since no intermediate objects will be created. PR-URL: #51044 Refs: nodejs/performance#136 Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
On OpenSSL 3, migrate from EVP_get_digestbyname() to EVP_MD_fetch() to get the implementation and use a per-Environment cache for it. The EVP_MDs are freed during Environment cleanup. Drive-by: declare the smart pointer for EVP_MD_CTX as EVPMDCtxPointer instead of EVPMDPointer to avoid confusion with EVP_MD pointers. PR-URL: #51034 Refs: https://www.openssl.org/docs/man3.0/man7/crypto.html#Explicit-fetching Refs: nodejs/performance#136 Reviewed-By: James M Snell <[email protected]>
PR-URL: #51044 Refs: nodejs/performance#136 Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
This patch introduces a helper crypto.hash() that computes a digest from the input at one shot. This can be 1.2-1.6x faster than the object-based createHash() for smaller inputs (<= 5MB) that are readily available (not streamed) and incur less memory overhead since no intermediate objects will be created. PR-URL: #51044 Refs: nodejs/performance#136 Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
PR-URL: nodejs#51044 Refs: nodejs/performance#136 Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
This patch introduces a helper crypto.hash() that computes a digest from the input at one shot. This can be 1.2-1.6x faster than the object-based createHash() for smaller inputs (<= 5MB) that are readily available (not streamed) and incur less memory overhead since no intermediate objects will be created. PR-URL: nodejs#51044 Refs: nodejs/performance#136 Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
What is the problem this feature will solve?
Making the hash functions significantly faster.
What is the feature you are proposing to solve the problem?
I'm not sure what the best option for this is, but running the following file:
I see the following output:
Basically Node's sha1 function seems at least 3x slower than Bun's.
Hashing is at the core of many important things, so I'd argue it's important to hash as fast as possible since that would speed up a cascade of use cases.
I'm not sure what the best solution is here, but if Bun is 3x faster than Node here presumably there's a lot of room for improvement.
What alternatives have you considered?
No response
The text was updated successfully, but these errors were encountered: