-
Notifications
You must be signed in to change notification settings - Fork 65
/
Copy pathChap_SIMD.tex
52 lines (43 loc) · 2.69 KB
/
Chap_SIMD.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
\cchapter{SIMD}{SIMD}
\label{chap:simd}
Single instruction, multiple data (SIMD) is a form of parallel execution
in which the same operation is performed on multiple data elements
independently in hardware vector processing units (VPU), also called SIMD units.
The addition of two vectors to form a third vector is a SIMD operation.
Many processors have SIMD (vector) units that can perform simultaneously
2, 4, 8 or more executions of the same operation (by a single SIMD unit).
Loops without loop-carried backward dependences (or with dependences preserved using
\kcode{ordered simd}) are candidates for vectorization by the compiler for
execution with SIMD units. In addition, with state-of-the-art vectorization
technology and \kcode{declare simd} directive extensions for function vectorization
in the OpenMP 4.5 specification, loops with function calls can be vectorized as well.
The basic idea is that a scalar function call in a loop can be replaced by a vector version
of the function, and the loop can be vectorized simultaneously by combining a loop
vectorization (\kcode{simd} directive on the loop) and a function
vectorization (\kcode{declare simd} directive on the function).
A \kcode{simd} construct states that SIMD operations be performed on the
data within the loop. A number of clauses are available to provide
data-sharing attributes (\kcode{private}, \kcode{linear}, \kcode{reduction} and
\kcode{lastprivate}). Other clauses provide vector length preference/restrictions
(\kcode{simdlen} / \kcode{safelen}), loop fusion (\kcode{collapse}), and data
alignment (\kcode{aligned}).
The \kcode{declare simd} directive designates
that a vector version of the function should also be constructed for
execution within loops that contain the function and have a \kcode{simd}
directive. Clauses provide argument specifications (\kcode{linear},
\kcode{uniform}, and \kcode{aligned}), a requested vector length
(\kcode{simdlen}), and designate whether the function is always/never
called conditionally in a loop (\kcode{notinbranch}/\kcode{inbranch}).
The latter is for optimizing performance.
Also, the \kcode{simd} construct has been combined with the worksharing loop
constructs (\kcode{for simd} and \kcode{do simd}) to enable simultaneous thread
execution in different SIMD units.
%Hence, the \code{simd} construct can be
%used alone on a loop to direct vectorization (SIMD execution), or in
%combination with a parallel loop construct to include thread parallelism
%(a parallel loop sequentially followed by a \code{simd} construct,
%or a combined construct such as \code{parallel do simd} or
%\code{parallel for simd}).
%===== Examples Sections =====
\input{SIMD/SIMD}
\input{SIMD/linear_modifier}