Skip to content

Commit

Permalink
Merge pull request #17665 from jketema/printir-doc
Browse files Browse the repository at this point in the history
C++: Add some documentation on the printed IR
  • Loading branch information
jketema authored Oct 8, 2024
2 parents 5a4cd1c + ed266da commit 3d6965a
Show file tree
Hide file tree
Showing 3 changed files with 318 additions and 0 deletions.
106 changes: 106 additions & 0 deletions cpp/ql/lib/semmle/code/cpp/ir/implementation/aliased_ssa/PrintIR.qll
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,112 @@
* uses, however, it is better to write a query that imports `PrintIR.qll`, extends
* `PrintIRConfiguration`, and overrides `shouldPrintDeclaration()` to select a subset of declarations
* to dump.
*
* Anatomy of a printed IR instruction
*
* An instruction:
*
* ```
* # 2281| v2281_19(void) = Call[~String] : func:r2281_18, this:r2281_17
* ```
*
* The prefix `# 2281|` specifies that this instruction was generated by the C++ source code on line 2281.
* Scrolling up in the printed output, one will eventually find the name of the file to which the line
* belongs.
*
* `v2281_19(void)` is the result of the instruction. Here, `v` means this is a void result or operand (so
* there should be no later uses of the result; see below for other possible values). The `2281_19` is a
* unique ID for the result. This is usually just the line number plus a small integer suffix to make it
* unique within the function. The type of the result is `void`. In this case, it is `void`, because
* `~String` returns `void`. The type of the result is usually just the name of the appropriate C++ type,
* but it will sometimes be a type like `glval<int>`, which means result holds a glvalue, which at the
* IR level works like a pointer. In other words, in the source code the type was `int`, but it is really
* more like an `int*`. We see this, for example, in `x = y;`, where `x` is a glvalue.
*
* `Call` is the opcode of the instruction. Common opcodes include:
*
* * Arithmetic operations: `Add`, `Sub`, `Mul`, etc.
* * Memory access operations: `Load`, `Store`.
* * Function calls: `Call`.
* * Literals: `Constant`.
* * Variable addresses: `VariableAddress`.
* * Function entry points: `EnterFunction`.
* * Return from a function: `Return`, `ReturnVoid`. Note that the value being returned is set separately by a
* `Store` to a special `#return` variable.
* * Stack unwinding for C++ function that throw and where the exception escapes the function: `Unwind`.
* * Common exit point for `Unwind` and `Return`: `ExitFunction`.
* * SSA-related opcodes: `Phi`, `Chi`.
*
* `[~String]` denotes additional information. The information might be present earlier in the IR, as is the case
* for `Call`, where it is the name of the called function. This is also the case for `Load` and `Store`, where it
* is the name of the variable that loaded or stored (if known). In the case of `Constant`, `FieldAddress`, and
* `VariableAddress`, the information between brackets does not occur earlier.
*
* `func:r2281_18` and `this:r28281_17` are the operands of the instruction. The `func:` prefix denotes the operand
* that holds the address of the called function. The `this:` prefix denotes the argument to the special `this`
* parameter of an instance member function. `r2281_18`, `r2281_17` are the unique IDs of the operands. Each of these
* matches the ID of a previously seen result, showing where that value came from. The `r` means that these are
* "register" operands (see below).
*
* Result and operand kinds:
*
* Every result and operand is one of these three kinds:
*
* * `r` "register". These operands are not stored in any particular memory location. We can think of them as
* temporary values created during the evaluation of an expression. A register operand almost always has one
* use, often in the same block as its definition.
* * `m` "memory". These operands represents accesses to a specific memory location. The location could be a
* local variable, a global variable, a field of an object, an element of an array, or any memory that we happen
* to have a pointer to. These only occur as the result of a `Store`, the source operand of a `Load` or on the
* SSA instructions (`Phi`, `Chi`).
* * `v` "void". Really just a register operand, but we mark register operands of type void with this special prefix
* so we know that there is no actual value there.
*
* Branches in the IR:
*
* The IR is divided into basic blocks. At the end of each block, there are one or more edges showing the possible
* control flow successors of the block.
*
* ```
* # 44| v44_3(void) = ConditionalBranch : r44_2
* #-----| False -> Block 4
* #-----| True -> Block 3
* ```
* Here we have a block that ends with a conditional branch. The two edges show where the control flows to depending
* on whether the condition is true or false.
*
* SSA instructions:
*
* We use `Phi` instructions in SSA to create a single definition for a variable that might be assigned on multiple
* control flow paths. The `Phi` instruction merges the potential values of that variable from each predecessor edge,
* and the resulting definition is then used wherever that variable is accessed later on.
*
* When dealing with aliased memory, we use the `Chi` instruction to create a single definition for memory that might
* or might not have been updated by a store, depending on the actual address that was written to. For example, take:
*
* ```cpp
* int x = 5;
* int y = 7;
* int* p = condition ? &x : &y;
* *p = 6;
* return x;
* ```
*
* At the point where we store to `*p`, we do not know whether `p` points to `x` or `y`. Thus, we do not know whether
* `return x;` is going to return the value that `x` was originally initialized to (5), or whether it will return 6,
* because it was overwritten by `*p = 6;`. We insert a `Chi` instruction immediately after the store to `*p`:
*
* ```
* r2(int) = Constant[6]
* r3(int*) = <<value of p>>
* m4(int) = Store : &r3, r2 // Stores the constant 6 to *p
* m5(unknown) = Chi : total:m1, partial:m4
* ```
* The `partial:` operand represents the memory that was just stored. The `total:` operand represents the previous
* contents of all of the memory that `p` might have pointed to (in this case, both `x` and `y`). The result of the
* `Chi` represents the new contents of whatever memory the `total:` operand referred to. We usually do not know exactly
* which parts of that memory were overwritten, but it does model that any of that memory could have been modified, so
* that later instructions do not assume that the memory was unchanged.
*/

private import internal.IRInternal
Expand Down
106 changes: 106 additions & 0 deletions cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/PrintIR.qll
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,112 @@
* uses, however, it is better to write a query that imports `PrintIR.qll`, extends
* `PrintIRConfiguration`, and overrides `shouldPrintDeclaration()` to select a subset of declarations
* to dump.
*
* Anatomy of a printed IR instruction
*
* An instruction:
*
* ```
* # 2281| v2281_19(void) = Call[~String] : func:r2281_18, this:r2281_17
* ```
*
* The prefix `# 2281|` specifies that this instruction was generated by the C++ source code on line 2281.
* Scrolling up in the printed output, one will eventually find the name of the file to which the line
* belongs.
*
* `v2281_19(void)` is the result of the instruction. Here, `v` means this is a void result or operand (so
* there should be no later uses of the result; see below for other possible values). The `2281_19` is a
* unique ID for the result. This is usually just the line number plus a small integer suffix to make it
* unique within the function. The type of the result is `void`. In this case, it is `void`, because
* `~String` returns `void`. The type of the result is usually just the name of the appropriate C++ type,
* but it will sometimes be a type like `glval<int>`, which means result holds a glvalue, which at the
* IR level works like a pointer. In other words, in the source code the type was `int`, but it is really
* more like an `int*`. We see this, for example, in `x = y;`, where `x` is a glvalue.
*
* `Call` is the opcode of the instruction. Common opcodes include:
*
* * Arithmetic operations: `Add`, `Sub`, `Mul`, etc.
* * Memory access operations: `Load`, `Store`.
* * Function calls: `Call`.
* * Literals: `Constant`.
* * Variable addresses: `VariableAddress`.
* * Function entry points: `EnterFunction`.
* * Return from a function: `Return`, `ReturnVoid`. Note that the value being returned is set separately by a
* `Store` to a special `#return` variable.
* * Stack unwinding for C++ function that throw and where the exception escapes the function: `Unwind`.
* * Common exit point for `Unwind` and `Return`: `ExitFunction`.
* * SSA-related opcodes: `Phi`, `Chi`.
*
* `[~String]` denotes additional information. The information might be present earlier in the IR, as is the case
* for `Call`, where it is the name of the called function. This is also the case for `Load` and `Store`, where it
* is the name of the variable that loaded or stored (if known). In the case of `Constant`, `FieldAddress`, and
* `VariableAddress`, the information between brackets does not occur earlier.
*
* `func:r2281_18` and `this:r28281_17` are the operands of the instruction. The `func:` prefix denotes the operand
* that holds the address of the called function. The `this:` prefix denotes the argument to the special `this`
* parameter of an instance member function. `r2281_18`, `r2281_17` are the unique IDs of the operands. Each of these
* matches the ID of a previously seen result, showing where that value came from. The `r` means that these are
* "register" operands (see below).
*
* Result and operand kinds:
*
* Every result and operand is one of these three kinds:
*
* * `r` "register". These operands are not stored in any particular memory location. We can think of them as
* temporary values created during the evaluation of an expression. A register operand almost always has one
* use, often in the same block as its definition.
* * `m` "memory". These operands represents accesses to a specific memory location. The location could be a
* local variable, a global variable, a field of an object, an element of an array, or any memory that we happen
* to have a pointer to. These only occur as the result of a `Store`, the source operand of a `Load` or on the
* SSA instructions (`Phi`, `Chi`).
* * `v` "void". Really just a register operand, but we mark register operands of type void with this special prefix
* so we know that there is no actual value there.
*
* Branches in the IR:
*
* The IR is divided into basic blocks. At the end of each block, there are one or more edges showing the possible
* control flow successors of the block.
*
* ```
* # 44| v44_3(void) = ConditionalBranch : r44_2
* #-----| False -> Block 4
* #-----| True -> Block 3
* ```
* Here we have a block that ends with a conditional branch. The two edges show where the control flows to depending
* on whether the condition is true or false.
*
* SSA instructions:
*
* We use `Phi` instructions in SSA to create a single definition for a variable that might be assigned on multiple
* control flow paths. The `Phi` instruction merges the potential values of that variable from each predecessor edge,
* and the resulting definition is then used wherever that variable is accessed later on.
*
* When dealing with aliased memory, we use the `Chi` instruction to create a single definition for memory that might
* or might not have been updated by a store, depending on the actual address that was written to. For example, take:
*
* ```cpp
* int x = 5;
* int y = 7;
* int* p = condition ? &x : &y;
* *p = 6;
* return x;
* ```
*
* At the point where we store to `*p`, we do not know whether `p` points to `x` or `y`. Thus, we do not know whether
* `return x;` is going to return the value that `x` was originally initialized to (5), or whether it will return 6,
* because it was overwritten by `*p = 6;`. We insert a `Chi` instruction immediately after the store to `*p`:
*
* ```
* r2(int) = Constant[6]
* r3(int*) = <<value of p>>
* m4(int) = Store : &r3, r2 // Stores the constant 6 to *p
* m5(unknown) = Chi : total:m1, partial:m4
* ```
* The `partial:` operand represents the memory that was just stored. The `total:` operand represents the previous
* contents of all of the memory that `p` might have pointed to (in this case, both `x` and `y`). The result of the
* `Chi` represents the new contents of whatever memory the `total:` operand referred to. We usually do not know exactly
* which parts of that memory were overwritten, but it does model that any of that memory could have been modified, so
* that later instructions do not assume that the memory was unchanged.
*/

private import internal.IRInternal
Expand Down
106 changes: 106 additions & 0 deletions cpp/ql/lib/semmle/code/cpp/ir/implementation/unaliased_ssa/PrintIR.qll
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,112 @@
* uses, however, it is better to write a query that imports `PrintIR.qll`, extends
* `PrintIRConfiguration`, and overrides `shouldPrintDeclaration()` to select a subset of declarations
* to dump.
*
* Anatomy of a printed IR instruction
*
* An instruction:
*
* ```
* # 2281| v2281_19(void) = Call[~String] : func:r2281_18, this:r2281_17
* ```
*
* The prefix `# 2281|` specifies that this instruction was generated by the C++ source code on line 2281.
* Scrolling up in the printed output, one will eventually find the name of the file to which the line
* belongs.
*
* `v2281_19(void)` is the result of the instruction. Here, `v` means this is a void result or operand (so
* there should be no later uses of the result; see below for other possible values). The `2281_19` is a
* unique ID for the result. This is usually just the line number plus a small integer suffix to make it
* unique within the function. The type of the result is `void`. In this case, it is `void`, because
* `~String` returns `void`. The type of the result is usually just the name of the appropriate C++ type,
* but it will sometimes be a type like `glval<int>`, which means result holds a glvalue, which at the
* IR level works like a pointer. In other words, in the source code the type was `int`, but it is really
* more like an `int*`. We see this, for example, in `x = y;`, where `x` is a glvalue.
*
* `Call` is the opcode of the instruction. Common opcodes include:
*
* * Arithmetic operations: `Add`, `Sub`, `Mul`, etc.
* * Memory access operations: `Load`, `Store`.
* * Function calls: `Call`.
* * Literals: `Constant`.
* * Variable addresses: `VariableAddress`.
* * Function entry points: `EnterFunction`.
* * Return from a function: `Return`, `ReturnVoid`. Note that the value being returned is set separately by a
* `Store` to a special `#return` variable.
* * Stack unwinding for C++ function that throw and where the exception escapes the function: `Unwind`.
* * Common exit point for `Unwind` and `Return`: `ExitFunction`.
* * SSA-related opcodes: `Phi`, `Chi`.
*
* `[~String]` denotes additional information. The information might be present earlier in the IR, as is the case
* for `Call`, where it is the name of the called function. This is also the case for `Load` and `Store`, where it
* is the name of the variable that loaded or stored (if known). In the case of `Constant`, `FieldAddress`, and
* `VariableAddress`, the information between brackets does not occur earlier.
*
* `func:r2281_18` and `this:r28281_17` are the operands of the instruction. The `func:` prefix denotes the operand
* that holds the address of the called function. The `this:` prefix denotes the argument to the special `this`
* parameter of an instance member function. `r2281_18`, `r2281_17` are the unique IDs of the operands. Each of these
* matches the ID of a previously seen result, showing where that value came from. The `r` means that these are
* "register" operands (see below).
*
* Result and operand kinds:
*
* Every result and operand is one of these three kinds:
*
* * `r` "register". These operands are not stored in any particular memory location. We can think of them as
* temporary values created during the evaluation of an expression. A register operand almost always has one
* use, often in the same block as its definition.
* * `m` "memory". These operands represents accesses to a specific memory location. The location could be a
* local variable, a global variable, a field of an object, an element of an array, or any memory that we happen
* to have a pointer to. These only occur as the result of a `Store`, the source operand of a `Load` or on the
* SSA instructions (`Phi`, `Chi`).
* * `v` "void". Really just a register operand, but we mark register operands of type void with this special prefix
* so we know that there is no actual value there.
*
* Branches in the IR:
*
* The IR is divided into basic blocks. At the end of each block, there are one or more edges showing the possible
* control flow successors of the block.
*
* ```
* # 44| v44_3(void) = ConditionalBranch : r44_2
* #-----| False -> Block 4
* #-----| True -> Block 3
* ```
* Here we have a block that ends with a conditional branch. The two edges show where the control flows to depending
* on whether the condition is true or false.
*
* SSA instructions:
*
* We use `Phi` instructions in SSA to create a single definition for a variable that might be assigned on multiple
* control flow paths. The `Phi` instruction merges the potential values of that variable from each predecessor edge,
* and the resulting definition is then used wherever that variable is accessed later on.
*
* When dealing with aliased memory, we use the `Chi` instruction to create a single definition for memory that might
* or might not have been updated by a store, depending on the actual address that was written to. For example, take:
*
* ```cpp
* int x = 5;
* int y = 7;
* int* p = condition ? &x : &y;
* *p = 6;
* return x;
* ```
*
* At the point where we store to `*p`, we do not know whether `p` points to `x` or `y`. Thus, we do not know whether
* `return x;` is going to return the value that `x` was originally initialized to (5), or whether it will return 6,
* because it was overwritten by `*p = 6;`. We insert a `Chi` instruction immediately after the store to `*p`:
*
* ```
* r2(int) = Constant[6]
* r3(int*) = <<value of p>>
* m4(int) = Store : &r3, r2 // Stores the constant 6 to *p
* m5(unknown) = Chi : total:m1, partial:m4
* ```
* The `partial:` operand represents the memory that was just stored. The `total:` operand represents the previous
* contents of all of the memory that `p` might have pointed to (in this case, both `x` and `y`). The result of the
* `Chi` represents the new contents of whatever memory the `total:` operand referred to. We usually do not know exactly
* which parts of that memory were overwritten, but it does model that any of that memory could have been modified, so
* that later instructions do not assume that the memory was unchanged.
*/

private import internal.IRInternal
Expand Down

0 comments on commit 3d6965a

Please sign in to comment.