Zero masked arithmetic operations #2426

mazimkhan · 2025-01-06T15:07:47Z

Introduces:

MaskedMaxOrZero(m, a, b): returns Max(a, b)[i] or zero if m[i] is false
MaskedAddOrZero(m, a, b): returns a[i] + b[i] or 0 if m[i] is false.
MaskedSubOrZero(m, a, b): returns a[i] - b[i] or 0 if m[i] is false.
MaskedMulOrZero( m, a, b): returns a[i] * b[i] or 0 if m[i] is false.
MaskedDivideOrZero(m, a, b): returns a[i] / b[i] or 0 if m[i] is false.
MaskedSaturatedAddOrZero(m, a, b): returns a[i] + b[i] saturated to the minimum/maximum representable value, or 0 if m[i]` is false.
MaskedSaturatedSubOrZero(m, a, b): returns a[i] - b[i] saturated to the minimum/maximum representable value, or 0 if m[i]` is false.
MaskedMulFixedPoint15OrZero(m, a): returns returns the result of multiplying two Q1.15 fixed-point numbers, or 0 if m[i] is false.
MaskedMulAddOrZero(m, a, b, c): returns a[i] * b[i] + c[i] or 0 if m[i] is false.
MaskedNegMulAddOrZero(m, a, b, c): returns -a[i] * b[i] + c[i] or 0 if m[i] is false.
MaskedWidenMulPairwiseAddOrZero(d, m, a, b): widens a and b to TFromD<D> and computes a[2*i+1]*b[2*i+1] + a[2*i+0]*b[2*i+0], or 0 if m[i] is false.

Testing is included for all operations where both the masking and the underlying operation is tested.

google-cla · 2025-01-06T15:07:52Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

jan-wassenberg · 2025-01-06T16:43:00Z

g3doc/quick_reference.md

+All ops in this section return `0` for `mask=false` lanes. These are equivalent
+to, and potentially more efficient than, `IfThenElseZero(m, Add(a, b));` etc.
+
+*   <code>V **MaskedMaxOrZero**(M m, V a, V b)</code>: returns `Max(a, b)[i]`


As in the other PR, how about just MaskedMax(m, a, b)?

jan-wassenberg · 2025-01-06T16:43:19Z

g3doc/quick_reference.md

+    or `0` if `m[i]` is false.
+*   <code>V **MaskedMulOrZero**(M m, V a, V b)</code>: returns `a[i] * b[i]`
+    or `0` if `m[i]` is false.
+*   <code>V **MaskedDivideOrZero**(M m, V a, V b)</code>: returns `a[i] / b[i]`


MaskedDiv for consistency?

jan-wassenberg · 2025-01-06T16:45:23Z

g3doc/quick_reference.md

+    false.
+*   <code>V **MaskedMulAddOrZero**(M m, V a, V b, V c)</code>: returns `a[i] *
+    b[i] + c[i]` or `0` if `m[i]` is false.
+*   <code>V **MaskedNegMulAddOrZero**(M m, V a, V b, V c)</code>: returns


It looks like we are adding Masked variants for almost all ops, but not MulSub. Do we actually have use-cases for all of these? I'd prefer to add as we go, rather than pre-emptively add ops just in case.

jan-wassenberg · 2025-01-06T16:45:58Z

hwy/ops/arm_sve-inl.h

+    return sv##OP##_##CHAR##BITS##_m(m, a, b);                             \
+  }
+// User-specified mask. Mask=false value is zero.
+#define HWY_SVE_RETV_ARGMVVZ(BASE, CHAR, BITS, HALF, NAME, OP)             \


Underscore before the Z (MVV_Z) for consistency?

jan-wassenberg · 2025-01-06T16:47:14Z

hwy/ops/arm_sve-inl.h

+#define HWY_NATIVE_ZERO_MASKED_ARITH
+#endif
+
+#define HWY_SVE_FMAZ(BASE, CHAR, BITS, HALF, NAME, OP)                     \


We could move this next to HWY_SVE_RETV_ARGMVVV and call it HWY_SVE_RETV_ARGMVVV_Z?
(Note also the P/M naming suggestion from PR 2428)

jan-wassenberg · 2025-01-06T16:48:37Z

hwy/ops/arm_sve-inl.h

+
+#undef HWY_SVE_FMAZ
+
+template <class V, class M>


For ops that just call detail::op, we can move them out of namespace detail and avoid the wrapper function.

jan-wassenberg · 2025-01-06T16:53:51Z

hwy/ops/arm_sve-inl.h

+HWY_SVE_FOREACH(HWY_SVE_RETV_ARGMVV_M, MaskedMul_M, mul);
+}
+
+// ------------------------------ MulLower


Not documented yet?
Also, Lower() is generally the lower half. There is a GetLane which returns the lowest. How about naming this MulLane?
Finally, this can be efficiently implemented via MaskedMulOr (which we have) plus FirstN(d, 1) or perhaps a new First1(d) op which wraps the svptrue. That seems like it would be more widely useful and no less efficient - how about we add that instead?

jan-wassenberg · 2025-01-06T16:55:55Z

hwy/tests/masked_arithmetic_test.cc

+  ForAllTypes(ForPartialVectors<TestMulAddOrZero>());
+}
+
+struct TestWidenMulPairwiseAddOrZero {


There's a lot of duplication here. What do you think of adding the MaskedNegMulAdd() call into the existing test for NegMulAdd? That avoids duplicating the test setup.

Zero masked arithmetic operations

be6179e

jan-wassenberg requested changes Jan 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero masked arithmetic operations #2426

Zero masked arithmetic operations #2426

mazimkhan commented Jan 6, 2025

google-cla bot commented Jan 6, 2025

jan-wassenberg Jan 6, 2025

jan-wassenberg Jan 6, 2025

jan-wassenberg Jan 6, 2025

jan-wassenberg Jan 6, 2025

jan-wassenberg Jan 6, 2025

jan-wassenberg Jan 6, 2025

jan-wassenberg Jan 6, 2025

jan-wassenberg Jan 6, 2025

Zero masked arithmetic operations #2426

Are you sure you want to change the base?

Zero masked arithmetic operations #2426

Conversation

mazimkhan commented Jan 6, 2025

google-cla bot commented Jan 6, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment