Skip to content

Latest commit

 

History

History
64 lines (48 loc) · 2.57 KB

README.md

File metadata and controls

64 lines (48 loc) · 2.57 KB

compile-time-simd-blend-mask

Compile-time blend masks that unifies _mm256_blend_epi8, _mm256_blend_epi16, _mm256_blend_epi32 by using the C++ library boost::hana

Introduction

The intrinsics functions

serve similar purpose, but they take different arguments. The first function encodes the blend mask in the SIMD vector __256i, but the two latter encode the blend mask in an int. The value of such an int needs to be known at compile-time (unlike the __m256i mask for the first function).

Example

This project implements a common API that can be used like this

    auto mask = hana::make_tuple(
      hana::false_c,  hana::false_c,
      hana::false_c, hana::false_c, 
      hana::true_c, hana::true_c, 
      hana::true_c, hana::true_c, 

      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 

      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 

      hana::true_c, hana::true_c, 
      hana::true_c, hana::true_c, 
      hana::true_c, hana::true_c, 
      hana::true_c, hana::true_c
    );
    __m256i a = _mm256_set1_epi8(0);
    __m256i b = _mm256_setr_epi8( 1,  2,  3,  4,  5,  6,  7,  8,
                                  9, 10, 11, 12, 13, 14, 15, 16,
                                 17, 18, 19, 20, 21, 22, 23, 24,
                                 25, 26, 27, 28, 29, 30, 31, 32);
  auto blend_result = blend256(a, b, mask);

At compile-time the mask will be analyzed and the fastest possible intrinsics function will be chosen (in this case _mm256_blend_epi32()).

If the mask allows it _mm256_blend_epi32() will be used. Otherwise if the mask allows it _mm256_blend_epi16() will be used. Otherwise _mm256_blend_epi8() will be used.

Implementation details

The file

src/blend256.h

contains the implementation of the blend256() function. The implementation makes use of the C++ library boost::hana.