Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing gather/scatter #11

Open
kfjahnke opened this issue Oct 19, 2019 · 8 comments
Open

missing gather/scatter #11

kfjahnke opened this issue Oct 19, 2019 · 8 comments

Comments

@kfjahnke
Copy link

The standard omits g/s (gather/scatter) operations. I think that class simd should offer such capabilities. I think coding g/s as member functions is the cleanest solution. I have proposed code here, which makes use of class simd's constructor taking a generator function. This route should convey sufficient information to the compiler to be optimized into hardware g/s instructions, but can be specialized to enforce this where the optimization fails.

@bernhardmgruber
Copy link

I also just encountered this shortcoming and would like to see appropriate gather/scatter functionality with std::simd. For the time being, I will use your suggestion @kfjahnke to implement gather/scatter via the generator constructor:

std::simd<T>([&](auto i) { return mem[idx[i]]; }); // gather
(void) std::simd<T>([&](auto i) { mem[idx[i]] = simd[i]; return T{}; }); // scatter

@ibaned
Copy link

ibaned commented May 7, 2021

@crtrott We implemented extensions to std::simd for gather/scatter and would like to present them here and second the motion to standardize these. The first part of these extensions is a type simd_index<T, Abi>, where T is still the corresponding floating-point type. This was used instead of simd<int, Abi> because the number of indices in the vector depends on how many floating-point types are in a floating-point vector, not how many integers can be packed into a vector register. In our case the simd_index always represents a 32-bit index, but we can consider an additional template parameter indicating the desired integer type. Like simd, simd_index<T, Abi> supports basic mathematical operators and comparisons resulting in simd_mask<T, Abi>.

We added static member methods like simd<T, Abi>::masked_load(T const*, simd_mask<T, Abi>) and simd<T, Abi>::masked_gather(T const*, simd_index<T, Abi>, simd_mask<T, Abi>) for masked load and gather instructions, and non-static member methods simd<T, Abi>::masked_store(T*, simd_mask<T, Abi>) and simd<T, Abi>::masked_scatter(T*, simd_index<T, Abi>, simd_mask<T, Abi>).

Then we added a layer of non-member methods for all types of loads and stores:

simd<T, Abi> load(T const*, simd_mask<T, Abi>); // masked load
simd<T, Abi> load(T const*, simd_index<T, Abi>, simd_mask<T, Abi>); // masked gather
store(simd<T, Abi>, T*, simd_mask<T, Abi>); // masked store
store(simd<T, Abi>, T*, simd_index<T, Abi>, simd_mask<T, Abi>); // masked scatter

@kfjahnke
Copy link
Author

kfjahnke commented May 8, 2021

Would you mind adding a link to your code?

@ibaned
Copy link

ibaned commented May 10, 2021

@ibaned
Copy link

ibaned commented Jan 5, 2022

I recently went back and tried to make this more consistent with the "where expressions" in the TS. Here is the sort of thing I'm aiming for:

where(mask, result).copy_from(ptr, gather(indices));

This requires adding a simd_index<T, Abi> type.
I also added types called gather<T, Abi> and scatter<T, Abi> that just wrap around a simd_index<T, Abi>.

Then, the gather and scatter types are added to the list of valid SIMD Flag types for copy_to and copy_from.

@ibaned
Copy link

ibaned commented Jan 5, 2022

I'm not sure about them being separate types since their implementations are identical, but it was nice to see the words "scatter" and "gather" somewhere. Either adding a new Flag type or adding new methods like gather_from and scatter_to would make sense to me.

@ibaned
Copy link

ibaned commented Jan 5, 2022

I'm switching to gather_from and scatter_to methods on simd<T, Abi> and where_expression

@danieltowner
Copy link
Collaborator

Some discussion here: https://isocpp.org/files/papers/P2664R2.html#memory_permutes. Comments welcome.

@danieltowner danieltowner added this to the P2664R3 milestone May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants