-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support using AVX and even AVX512 #11
Comments
Not sure whether the reference implementation already takes the advantage of AVX, but it seems on my machine, it is much faster than this Rust implementation:
(This is > 30% difference, while the difference shown in the README is only < 20%) |
It seems the reference implementation doesn't take the advantage of AVX either. Even if compiled with |
So I discovered this afternoon that codegenning AVX is a simple matter of: $ export RUSTFLAGS='-C target-feature=+avx'
$ cargo clean
$ cargo bench --features simd
$ objdump -d target/release/argon2rs-* | grep vpalignr | head -n 2 # confirm.
1644d: c4 e3 51 0f e7 08 vpalignr $0x8,%xmm7,%xmm5,%xmm4
16453: c4 e3 41 0f ed 08 vpalignr $0x8,%xmm5,%xmm7,%xmm5 which does wonders for the cross-swap operations in the Argon2 permutation function. |
In fact, the ref-impl does try to exploit AVX. See, for instance, blamka. The Unfortunately, cargon's |
And on the topic of 23:11:48 ~/argon2rs/> RUSTFLAGS='-C target-feature=+avx' cargo bench --features simd
Running target/release/versus_cargon-9211de8e436df972
running 3 tests
test ensure_identical_hashes ... ignored
test bench_argon2rs_i ... bench: 8,488,636 ns/iter (+/- 32,292)
test bench_cargon_i ... bench: 10,011,768 ns/iter (+/- 491,314)
test result: ok. 0 passed; 0 failed; 1 ignored; 2 measured |
It seems whether AVX is enabled doesn't change things a lot.
Although it is still not as fast as the reference implementation here, it is indeed much faster than before, good job! |
Looks like the only difference is not calling crossbeam when not needed :) |
Nowadays, most mainstream x86 CPUs support AVX, which supports doing integer computation on 256bit. I believe that would further improve the performance.
Intel may start shipping CPUs which support AVX512 in the coming year. AVX512, as its name indicates, supports computation on 512bits. This should probably be considered as well I guess.
The text was updated successfully, but these errors were encountered: