Array of struct of fixed arrays

rewrite array of structure as array of structured of fixed size array e.g.

#define PSIZE 8
struct Particle {
double x[PSIZE];
double px[PSIZE];
...

and use work-group and work-items to dispatch the work.

Explore usage of double2, double 16

It looks like P100 cards and XEON CU shows peak FP64 performance only on double2 data types (as computed by https://github.com/krrishnarraj/clpeak):

  Tesla P100-SXM2-16GB
  Compute units   : 56
  Clock frequency : 1480 MHz
  Double-precision compute (GFLOPS)
  double   : 3883.69
  double2  : 5323.60
  double4  : 5312.50
  double8  : 5288.60
  double16 : 5236.51

  Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
  Compute units   : 40
  Clock frequency : 1442 MHz
  Double-precision compute (GFLOPS)
  double   : 31.77
  double2  : 63.73
  double4  : 126.43
  double8  : 245.86
  double16 : 354.48

  R9 280X
  Compute units   : 32
  Clock frequency : 1100 MHz
  Double-precision compute (GFLOPS)
  double   : 1078.46
  double2  : 1075.55
  double4  : 1053.28
  double8  : 1039.24
  double16 : 1021.07

verify that CPU code vectorize the tracking functions

check also impact of allocated aligned memory and use openmp simd aligned... to see if vectorized code is faster

Home

Optimizations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Array of struct of fixed arrays

rewrite array of structure as array of structured of fixed size array e.g.

Explore usage of double2, double 16

verify that CPU code vectorize the tracking functions

Clone this wiki locally