Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero masked arithmetic operations #2426

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mazimkhan
Copy link

Introduces:

  • MaskedMaxOrZero(m, a, b): returns Max(a, b)[i] or zero if m[i] is false
  • MaskedAddOrZero(m, a, b): returns a[i] + b[i] or 0 if m[i] is false.
  • MaskedSubOrZero(m, a, b): returns a[i] - b[i] or 0 if m[i] is false.
  • MaskedMulOrZero( m, a, b): returns a[i] * b[i] or 0 if m[i] is false.
  • MaskedDivideOrZero(m, a, b): returns a[i] / b[i] or 0 if m[i] is false.
  • MaskedSaturatedAddOrZero(m, a, b): returns a[i] + b[i] saturated to the minimum/maximum representable value, or 0 if m[i]` is false.
  • MaskedSaturatedSubOrZero(m, a, b): returns a[i] - b[i] saturated to the minimum/maximum representable value, or 0 if m[i]` is false.
  • MaskedMulFixedPoint15OrZero(m, a): returns returns the result of multiplying two Q1.15 fixed-point numbers, or 0 if m[i] is false.
  • MaskedMulAddOrZero(m, a, b, c): returns a[i] * b[i] + c[i] or 0 if m[i] is false.
  • MaskedNegMulAddOrZero(m, a, b, c): returns -a[i] * b[i] + c[i] or 0 if m[i] is false.
  • MaskedWidenMulPairwiseAddOrZero(d, m, a, b): widens a and b to TFromD<D> and computes a[2*i+1]*b[2*i+1] + a[2*i+0]*b[2*i+0], or 0 if m[i] is false.

Testing is included for all operations where both the masking and the underlying operation is tested.

Copy link

google-cla bot commented Jan 6, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

All ops in this section return `0` for `mask=false` lanes. These are equivalent
to, and potentially more efficient than, `IfThenElseZero(m, Add(a, b));` etc.

* <code>V **MaskedMaxOrZero**(M m, V a, V b)</code>: returns `Max(a, b)[i]`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in the other PR, how about just MaskedMax(m, a, b)?

or `0` if `m[i]` is false.
* <code>V **MaskedMulOrZero**(M m, V a, V b)</code>: returns `a[i] * b[i]`
or `0` if `m[i]` is false.
* <code>V **MaskedDivideOrZero**(M m, V a, V b)</code>: returns `a[i] / b[i]`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaskedDiv for consistency?

false.
* <code>V **MaskedMulAddOrZero**(M m, V a, V b, V c)</code>: returns `a[i] *
b[i] + c[i]` or `0` if `m[i]` is false.
* <code>V **MaskedNegMulAddOrZero**(M m, V a, V b, V c)</code>: returns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we are adding Masked variants for almost all ops, but not MulSub. Do we actually have use-cases for all of these? I'd prefer to add as we go, rather than pre-emptively add ops just in case.

return sv##OP##_##CHAR##BITS##_m(m, a, b); \
}
// User-specified mask. Mask=false value is zero.
#define HWY_SVE_RETV_ARGMVVZ(BASE, CHAR, BITS, HALF, NAME, OP) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underscore before the Z (MVV_Z) for consistency?

#define HWY_NATIVE_ZERO_MASKED_ARITH
#endif

#define HWY_SVE_FMAZ(BASE, CHAR, BITS, HALF, NAME, OP) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move this next to HWY_SVE_RETV_ARGMVVV and call it HWY_SVE_RETV_ARGMVVV_Z?
(Note also the P/M naming suggestion from PR 2428)


#undef HWY_SVE_FMAZ

template <class V, class M>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ops that just call detail::op, we can move them out of namespace detail and avoid the wrapper function.

HWY_SVE_FOREACH(HWY_SVE_RETV_ARGMVV_M, MaskedMul_M, mul);
}

// ------------------------------ MulLower
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not documented yet?
Also, Lower() is generally the lower half. There is a GetLane which returns the lowest. How about naming this MulLane?
Finally, this can be efficiently implemented via MaskedMulOr (which we have) plus FirstN(d, 1) or perhaps a new First1(d) op which wraps the svptrue. That seems like it would be more widely useful and no less efficient - how about we add that instead?

ForAllTypes(ForPartialVectors<TestMulAddOrZero>());
}

struct TestWidenMulPairwiseAddOrZero {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of duplication here. What do you think of adding the MaskedNegMulAdd() call into the existing test for NegMulAdd? That avoids duplicating the test setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants