Load/Store, masked set and counting operations #2430

mazimkhan · 2025-01-06T16:10:06Z

Introduces:

variants of load and store operations including masked variants (MaskedLoadU, LoadHigher, StoreTruncated)
Counting functions to find information about the data in each lane of a vector (MaskedLeadingZeroCountOrZero, AllOnes, AllZeros)
Masked vector instantiation operations (SetOr, SetOrZero).

"OrZero" operations will return zero where the mask is false whereas standard masking returns the corresponding lane of a passed vector.

All introduced operations are implemented in generic_ops-inl.h and in arm_sve-inl.h where there is a performance gain to be made. Testing is also performed for all new operations.

google-cla · 2025-01-06T16:10:11Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

jan-wassenberg · 2025-01-06T16:57:46Z

g3doc/quick_reference.md

@@ -429,6 +429,10 @@ for comparisons, for example `Lt` instead of `operator<`.
    the result, with `t0` in the least-significant (lowest-indexed) lane of each
    128-bit block and `tK` in the most-significant (highest-indexed) lane of
    each 128-bit block: `{t0, t1, ..., tK}`
+*   <code>V **SetOr**(V no, M m, T a)</code>: returns N-lane vector with lane


How about renaming to MaskedSetOr() and MaskedSet()?

jan-wassenberg · 2025-01-06T16:59:02Z

g3doc/quick_reference.md

@@ -1029,6 +1037,12 @@ Per-lane variable shifts (slow if SSSE3/SSE4, or 16-bit, or Shr i64 on AVX2):
    ```HighestValue<MakeSigned<TFromV<V>>>()``` is returned in the
    corresponding result lanes.

+*   <code>bool **AllOnes**(D, V v)</code>: returns whether all bits in `v[i]`


It could be understood that this checks if all lane values are 1.0. Would AllBits1 or AllBitsOne be more clear? Same below.

jan-wassenberg · 2025-01-06T17:00:00Z

g3doc/quick_reference.md

@@ -1508,6 +1522,9 @@ aligned memory at indices which are not a multiple of the vector length):

 *   <code>Vec&lt;D&gt; **LoadU**(D, const T* p)</code>: returns `p[i]`.

+*   <code>Vec&lt;D&gt; **MaskedLoadU**(D, M m, const T* p)</code>: returns `p[i]`


MaskedLoad already exists for this purpose, we can remove this :) Note that all the load ops except Load() itself do not require alignment.

jan-wassenberg · 2025-01-06T17:01:58Z

g3doc/quick_reference.md

@@ -1538,6 +1555,10 @@ aligned memory at indices which are not a multiple of the vector length):
    lanes from `p` to the first (lowest-index) lanes of the result vector and
    fills the remaining lanes with `no`. Like LoadN, this does not fault.

+*   <code> Vec&lt;D&gt; **LoadHigher**(D d, V v, T* p)</code>: Loads `Lanes(d)/2` lanes from


Let's rename to *Upper for consistency.
This could be misunderstood to load the upper half of a vector starting at p, it seems instead to load directly from p. How about something like InsertIntoUpper, and perhaps v as the last argument?

jan-wassenberg · 2025-01-06T17:07:10Z

g3doc/quick_reference.md

+*   <code>void **StoreTruncated**(Vec&lt;DFrom&gt; v, DFrom d, To* HWY_RESTRICT
+    p)</code>: Truncates elements of `v` to type `To` and stores on `p`. It is
+    similar to performing `TruncateTo` followed by `StoreN` where
+    `max_lanes_to_store` is `Lanes(d)`.


I'm curious why we mention the StoreN here? It seems like natural store semantics that it stores an entire vector (in this case after truncating to smaller elements), so we could remove the mention of max_lanes and "does not modify".
Generally we define the D argument that comes before a pointer to describe what is being stored. So it would be D, not DFrom here.
Also, we have CompressStore which emphasizes that Compress happens first. Seems that TruncateStore would be a more consistent naming?

jan-wassenberg · 2025-01-06T17:07:33Z

hwy/ops/arm_sve-inl.h

+// ------------------------------ SetOr/SetOrZero
+
+#define HWY_SVE_SET_OR(BASE, CHAR, BITS, HALF, NAME, OP)                     \
+  HWY_API HWY_SVE_V(BASE, BITS) NAME(HWY_SVE_V(BASE, BITS) inactive,         \


Let's rename inactive to no per convention.

jan-wassenberg · 2025-01-06T17:11:06Z

hwy/ops/generic_ops-inl.h

+#endif
+template <class D, typename T, class V = VFromD<D>(), HWY_IF_LANES_GT_D(D, 1)>
+HWY_API V LoadHigher(D d, V a, T* p) {
+  const V b = LoadU(d, p);


Users may be surprised that there's a full-length load here, this might crash.
Alternative:

Half<D> dh; const VFromD<decltype(dh)> b = LoadU(dh, p); return Combine(d, b, LowerHalf(a));

jan-wassenberg · 2025-01-06T17:13:07Z

hwy/ops/generic_ops-inl.h

+template <class V>
+HWY_API bool AllOnes(V a) {
+  DFromV<V> d;
+  return AllTrue(d, Eq(Not(a), Zero(d)));


Likely a bit more efficient: we can BitCast to unsigned, then compare to Set(du, hwy::HighestValue(du).

Load/Store, masked set and counting operations

c1f7768

jan-wassenberg requested changes Jan 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load/Store, masked set and counting operations #2430

Load/Store, masked set and counting operations #2430

mazimkhan commented Jan 6, 2025

google-cla bot commented Jan 6, 2025

jan-wassenberg Jan 6, 2025

jan-wassenberg Jan 6, 2025

jan-wassenberg Jan 6, 2025

jan-wassenberg Jan 6, 2025

jan-wassenberg Jan 6, 2025

jan-wassenberg Jan 6, 2025

jan-wassenberg Jan 6, 2025

jan-wassenberg Jan 6, 2025

		@@ -1508,6 +1522,9 @@ aligned memory at indices which are not a multiple of the vector length):

		* <code>Vec<D> LoadU(D, const T* p)</code>: returns `p[i]`.

		* <code>Vec<D> MaskedLoadU(D, M m, const T* p)</code>: returns `p[i]`

Load/Store, masked set and counting operations #2430

Are you sure you want to change the base?

Load/Store, masked set and counting operations #2430

Conversation

mazimkhan commented Jan 6, 2025

google-cla bot commented Jan 6, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment