-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
missing gather/scatter #11
Comments
I also just encountered this shortcoming and would like to see appropriate gather/scatter functionality with std::simd<T>([&](auto i) { return mem[idx[i]]; }); // gather
(void) std::simd<T>([&](auto i) { mem[idx[i]] = simd[i]; return T{}; }); // scatter |
@crtrott We implemented extensions to We added static member methods like Then we added a layer of non-member methods for all types of loads and stores: simd<T, Abi> load(T const*, simd_mask<T, Abi>); // masked load
simd<T, Abi> load(T const*, simd_index<T, Abi>, simd_mask<T, Abi>); // masked gather
store(simd<T, Abi>, T*, simd_mask<T, Abi>); // masked store
store(simd<T, Abi>, T*, simd_index<T, Abi>, simd_mask<T, Abi>); // masked scatter |
Would you mind adding a link to your code? |
I recently went back and tried to make this more consistent with the "where expressions" in the TS. Here is the sort of thing I'm aiming for: where(mask, result).copy_from(ptr, gather(indices)); This requires adding a Then, the |
I'm not sure about them being separate types since their implementations are identical, but it was nice to see the words "scatter" and "gather" somewhere. Either adding a new |
I'm switching to |
Some discussion here: https://isocpp.org/files/papers/P2664R2.html#memory_permutes. Comments welcome. |
The standard omits g/s (gather/scatter) operations. I think that class simd should offer such capabilities. I think coding g/s as member functions is the cleanest solution. I have proposed code here, which makes use of class simd's constructor taking a generator function. This route should convey sufficient information to the compiler to be optimized into hardware g/s instructions, but can be specialized to enforce this where the optimization fails.
The text was updated successfully, but these errors were encountered: