-
Notifications
You must be signed in to change notification settings - Fork 16
Array of struct of fixed arrays
Riccardo De Maria edited this page May 21, 2017
·
1 revision
#define PSIZE 8
struct Particle {
double x[PSIZE];
double px[PSIZE];
...
and use work-group and work-items to dispatch the work.
It looks like P100 cards and XEON CU shows peak FP64 performance only on double2 data types (as computed by https://github.com/krrishnarraj/clpeak):
Tesla P100-SXM2-16GB
Compute units : 56
Clock frequency : 1480 MHz
Double-precision compute (GFLOPS)
double : 3883.69
double2 : 5323.60
double4 : 5312.50
double8 : 5288.60
double16 : 5236.51
Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
Compute units : 40
Clock frequency : 1442 MHz
Double-precision compute (GFLOPS)
double : 31.77
double2 : 63.73
double4 : 126.43
double8 : 245.86
double16 : 354.48
R9 280X
Compute units : 32
Clock frequency : 1100 MHz
Double-precision compute (GFLOPS)
double : 1078.46
double2 : 1075.55
double4 : 1053.28
double8 : 1039.24
double16 : 1021.07
check also impact of allocated aligned memory and use openmp simd aligned...
to see if vectorized code is faster