forked from boegel/MICA
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
422 lines (287 loc) · 13.2 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
MICA: Microarchitecture-Independent Characterization of Applications
====================================================================
version 0.40
Kenneth Hoste & Lieven Eeckhout (Ghent University, Belgium)
with contributions by:
- Hamid Fadishei (multi-process support)
- Petr Tuma (code cleanup)
website: http://boegel.kejo.be/ELIS/MICA (http://www.elis.ugent.be/~kehoste/mica)
A set of tutorial slides on MICA, which were presented at IISWC-2007 are
available from the MICA website.
* Disclaimer
------------
This software was only tested on Linux/x86. Anyone who wants to use it on a different
platform supported by Pin is free to do so, but should expect problems...
Any problem reports or questions are welcome at [email protected] .
* Compilation
--------------
The easiest way to compile MICA is to add unzip/untar mica_vXYZ.tar.gz to the source/tools
directory of the Pin kit you are using. If you wish to place mica in a different
directory, you'll have to adjust the makefile included accordinly.
Running 'make' should produce the 'mica_v0-X' shared library.
By default, MICA is built using the GCC C++ compiler (g++).
Since Pin kit 39599 (March 2nd 2011), building Pin tools with the Intel compilers is
also supported. To build MICA using the Intel C++ compiler, run "make CXX=icpc".
Make sure /opt/intel/lib is added to the LD_LIBRARY_PATH environment variable to
use MICA built using the Intel compilers.
* Specifying type of analysis
-----------------------------
MICA supports various types of microarchitecture-independent characteristics.
It also allows to measure the characteristics either for the entire execution, or
per interval of N dynamic instructions.
Specifying the parameters is done using the mica.conf configuration file.
A sample mica.conf file is provided with the distribution, and details
on how to specify the parameters are found below.
analysis_type: all | ilp | ilp_one | itypes | ppm | reg | stride | memfootprint | memreusedist | custom
interval_size: full | <size>
[ilp_size: <size>]
[block_size: <2^size>]
[page_size: <2^size>]
[itypes_spec_file: <file>]
example:
analysis_type: all
interval_size: 100000000
block_size: 6
page_size: 12
itypes_spec_file: itypes_default.spec
specifies to measure all supported characteristics per interval of 100,000,000 instructions,
with block size of 64 (2^6), page size of 4K (2^12), and using the instruction mix categories
described in the file itypes_default.spec
* Usage
-------
Using MICA is very easy; just run:
pin -t mica.so -- <program> [<parameter>]
The type of analysis is specified in the mica.conf file, and some
logging is written to mica.log.
* Output files
---------------
(I realize the output file names are a bit strange, but that's just the way I
chose them... It's easy to adjust them yourself! ).
ilp:
full: ilp_full_int_pin.out
interval: ilp_phases_int_pin.out
ilp_one:
full: ilp<size>_full_int_pin.out
interval: ilp<size>_phases_int_pin.out
itypes:
full: itypes_full_int_pin.out
interval: itypes_phases_int_pin.out
ppm:
full: ppm_full_int_pin.out
interval: ppm_phases_int_pin.out
reg:
full: reg_full_int_pin.out
interval: reg_phases_int_pin.out
stride:
full: stride_full_int_pin.out
interval: stride_phases_int_pin.out
memfootprint:
full: memfootprint_full_int_pin.out
interval: memfootprint_phases_int_pin.out
memreusedist:
full: memreusedist_full_int_pin.out
interval: memreusedist_phases_int_pin.out
* Full execution metrics
-----------------------------------
+++ ilp +++
Instruction-Level Parallellism (ILP) available for four different instruction
window sizes (32, 64, 128, 256).
This is measured by assuming perfect caches, perfect branch prediction, etc.
The only limitations are the instruction window size and the data dependences.
analysis_type: ilp
Besides measuring these four window sizes at once, MICA also supports
specifying a single window size, which is specified as follows (for
characterizing the full run using an instruction window of 32 entries):
analysis_type: ilp_one
interval_size: full
ilp_size: 32
You can tweak the block size used using the block_size configuration parameter.
+++ itypes +++
analysis_type: itypes
Instruction mix.
The instruction mix is evaluated by categorizing the executed instructions.
Because the x86 architecture isn't a load-store architecture, we count memory
reads/writes seperately. The following categories are used by default (in order
of output):
- memory read (instructions which read from memory)
- memory write (instructions which write to memory)
- control flow
- arithmetic
- floating-point
- stack
- shift
- string
- sse
- other
- nop
It is possible to redefine the instruction mix categories, by creating a specification
file and mentioning it in the mica.conf file (itypes_spec_file).
+++ ppm +++
analysis_type: ppm
Branch predictability.
The branch predictability of the conditional branches in the program is
evaluated using a Prediction-by-Partial-Match (PPM) predictor, in 4 different
configurations (global/local branch history, shared/seperate prediction
table(s)), using 3 different history length (4,8,12 bits). Additionally,
average taken and transition count are also being measured.
+++ reg +++
analysis_type: reg
Register traffic.
The register traffic is analyzed in different aspects:
- average number of register operands
- average degree of use
- dependency distances (prob. <= D)
Dependency distances are chosen in powers of 2, i.e. 1, 2, 4, 8, 16, 32, 64
+++ stride +++
analysis_type: stride
Data stream strides.
The distances between subsequent memory accesses are characterised by:
- local load (memory read) strides
- global load (memory read) strides
- local store (memory write) strides
- global store (memory write) strides
Local means per static instruction accesses, global means over all
instructions. The strides are characterized by powers of 8 (prob. <= 0, 8, 64,
512, 4096, 32768, 262144)
+++ memfootprint +++
analysis_type: memfootprint
Instruction and data memory footprint.
The size of the instruction and data memory footprint is characterized by
counting the number of blocks (64-byte) and pages (4KB) touched. This
is done seperately for data and instruction addresses.
+++ memreusedist +++
analysis_type: memreusedist
Memory reuse distances.
This is a highly valuable set of numbers to characterize the cache behavior
of the application of interest. For each memory read, the corresponding
64-byte cache block is determined. For each cache block accessed, the number
of unique cache blocks accessed since the last time it was referenced is
determined, using a LRU stack.
The reuse distances for all memory reads are reported in buckets. The first
bucket is used for so called 'cold references'. The subsequent buckets capture reuse
distances of [2^n, 2^(n+1)[, where n ranges from 0 to 18. The first of these
actually captures [0,2[ (not [1,2[), while the last bucket, [2^18, 2^19[, captures all
reuse distances larger then or equal to 2^18, so it's in fact [2^18, oo[.
In total, this delivers 20 buckets, and the total number of memory accesses
(the first number in the output), thus 21 numbers.
For example: the fifth bucket, corresponds to accesses with reuse distance
between 2^3 and 2^4 (or 8 64-byte cache blocks to 16 64-byte cache blocks).
Note: because memory addresses vary over different executions of the same
program, these numbers may vary slightly across multiple runs. Please be aware
of this when using these metrics for research purposes.
To track the progress of the MICA analysis being run, see the mica_progress.txt tool
which shows how many dynamic instructions have been analyzed. Disabling this can be
done by removing the -DVERBOSE flag in the Makefile and rebuilding MICA.
* Interval metrics
-------------------
Besides characterization total program execution, the tool is also capable of
characterizing interval behavior. The analysis are identical to the tools
above, but flush the state for each new each interval.
+++ ilp +++
RESET: instruction and cycle counters (per interval), free memory used for
memory address stuff (to avoid huge memory requirements for large workloads)
DON'T TOUCH: instruction window contents; global instruction and cycle counters
+++ itypes +++
RESET: instruction type counters
+++ ppm +++
RESET: misprediction counts, taken/transition counts
DON'T TOUCH: branch history tables
+++ reg +++
RESET: operand counts, register use distribution and register age distribution
DON'T TOUCH: register use counts (i.e. keep track of register use counts across
interval boundaries); register definition addresses
+++ stride +++
RESET: instruction counts (mem.read, mem.write, interval), distribution counts
DON'T TOUCH: last (global/local) read/write memory addresses
+++ memfootprint +++
RESET: reference counters, free memory used for memory address stuff (to avoid
huge memory requirements for large workloads)
DON'T TOUCH: -
+++ memreusedist +++
RESET: bucket counts (including cold reference and memory access counts)
DON't TOUCH: LRU stack (keep track of reuse distances over interval boundaries)
* Measured in integer values, convert to floating-point
-------------------------------------------------------
Because of historical reasons (problems with printing out floating-point
numbers in certain situations with previous Pin kits), we only print out
integer values and convert to floating-point metrics offline. This also allows
aggregating data measured per interval to larger intervals or full execution
for most characteristics.
S: interval size
N: number of intervals
I: number of instructions
+++ ilp +++
FORMAT:
instruction_count<space>cycle_count_win_size_1<space>cycle_count_win_size_2<space>...<space>cycle_count_win_size_n
CONVERSION:
instruction_count/cycle_count
i.e.
1 to (N-1)th line: S/cycle_count_win_size_i
Nth line: (I-N*S)/cycle_count_win_size_i
+++ itypes +++
FORMAT:
instruction_cnt<space>mem_read_cnt<space>mem_write_cnt<space>control_cnt<space>arith_cnt<space>fp_cnt<space>stack_cnt<space>shift_cnt<space>string_cnt<space>sse_cnt<space>system_cnt<space>nop_cnt<space>other_cnt
CONVERSION:
mem_write_cnt/instruction_cnt
...
other_cnt/instruction_cnt
NOTE
Note that simply adding the (n-1) last numbers won't necceseraly yield the first number.
First of all, the memory read and write counts shouldn't be added to the total, because
the x86 architecture is not a load/store architecture (e.g. an instruction can both read
memory and be a floating-point instruction).
Secondly, some instructions may fit in multiple categories, and therefore simply adding the
counts for the various categories will cause instructions to be counted double.
Also note that the (sum of) instruction_cnt value(s) will not match the instruction count
printed at the last line of the output file ("number of instructions: <int>"). This is because
in the former, each iteration of a REP-prefixed instruction is counted, while in the latter
a REP-prefixed instruction in only counted once.
The other_cnt contains the number of instructions that did not fit in any of the other categories
(excluding mem_read and mem_write). More details on which kind of instructions this includes can
be found in the itypes_other_group_categories.txt output file.
+++ ppm +++
FORMAT:
instr_cnt<space>GAg_mispred_cnt_4bits<space>PAg_mispred_cnt_4bits<space>GAs_mispred_cnt_4bits<space>PAs_mispred_cnt_4bits<space>...<space>PAs_mispred_cnt_12bits
CONVERSION:
GAg_mispred_cnt_Kbits/instr_cnt
...
PAs_mispred_cnt_Kbits/instr_cnt
+++ reg +++
FORMAT:
instr_cnt<space>total_oper_cnt<space>instr_reg_cnt<space>total_reg_use_cnt<space>total_reg_age<space>reg_age_cnt_1<space>reg_age_cnt_2<space>reg_age_cnt_4<space>...<space>reg_age_cnt_64
CONVERSION:
total_oper_cnt/instr_cnt
total_reg_use_cnt/instr_reg_cnt
reg_age_cnt_1/total_reg_age
reg_age_cnt_2/total_reg_age
...
reg_age_cnt_64/total_reg_age
+++ stride +++
FORMAT:
mem_read_cnt<space>mem_read_local_stride_0<space>mem_read_local_stride_8<space>...<space>mem_read_local_stride_262144<space>mem_read_global_stride_0<space>...<space>mem_read_global_stride_262144<space>mem_write_cnt<space>mem_write_local_stride_0<space>...<space>mem_write_global_stride_262144
CONVERSION:
mem_read_local_stride_0/mem_read_cnt
...
mem_read_global_stride_262144/mem_read_cnt
mem_write_local_stride_0/mem_write_cnt
...
mem_write_global_stride_262144/mem_write_cnt
+++ memfootprint +++
Integer output (no conversion needed).
FORMAT:
num_64-byte_blocks_data<space>num_4KB_pages_data<space>num_64-byte_blocks_instr<space>num_4KB_pages_instr
+++ memreusedist +++
FORMAT:
mem_access_cnt<space>cold_ref_cnt<space>acc_cnt_0-2<space>acc_cnt_2-2^2<space>acc_cnt_2^2-2^3<space>...<space>acc_cnt_2^17-2^18<space>acc_cnt_over_2^18
CONVERSION:
cold_ref_cnt/mem_access_cnt
acc_cnt_0/mem_access_cnt
...
acc_cnt_2^18-2^19/mem_access_cnt
acc_cnt_rest/mem_access_cnt
* Multi-process binaries
-----------------------------------
If you want to use MICA on multiprocess binaries which call fork and execv, you should specify this entry in the MICA configuration file:
append_pid: yes
This will tell MICA to append the current process ID to the report file names so multiple processes do not overwrite each other's output.
Additionally, you should pass "-follow_execv 1" parameter to pin in order to trace multiprocess applications.