-
Notifications
You must be signed in to change notification settings - Fork 16
Three address instruction list IR
JcRawInstList
represents a raw list representation of three-address Java bytecode instructions. "Raw" means that
this representation does not depict information about
types and control flow of the program. This representation is one-to-one
matching of JVM bytecode instructions to the list of three-address instructions.
The base class of this representation is JcRawInstList
. It produces a list-like
collection of instructions, so you can modify this list, iterate over instructions or access them by
index.
JcRawInst
is the base interface for the raw instructions. All instructions are identified
by the object, they are not comparable using equals
.
Here is the list of JcRawInst
implementations:
-
JcAssignInst
— Assignment instruction. Left-hand side of the instruction can only beJcRawValue
, right-hand side can be any expression (JcRawExpr
). -
JcRawEnterMonitorInst
,JcRawExitMonitorInst
— Monitor instructions. Correspond directly to their existing analogs. Themonitor
property can only beJcRawSimpleValue
. -
JcRawCallInst
— Call instruction. Represents a method that does not save its return variable to any local variable. Method calls that return a value are represented viaJcRawAssignInst
. -
JcRawLabelInst
— Label instruction. Used to mark some program points in the code. Mainly required to be used in the branching instructions. A label is identified by name, all the references to a label are represented viaJcRawLabelRef
class. -
JcRawReturnInst
— Return instruction. ThereturnValue
property isnull
when a method does not return anything. -
JcRawThrowInst
— Throw instruction. -
JcRawCatchInst
— Catch instruction. Represents an entry for atry...catch
block in the code. Does not map directly to a bytecode instruction but represents aTryCatchBlock
of a method. Stores a value that corresponds to a caughtthrowable
exception and a range of the instructions that it catches fromstartInclusive until endExclusive
. -
JcRawGotoInst
— Jump instruction. -
JcRawIfInst
— Conditional jump instruction. The condition of the instruction should necessarily beJcRawConditionExpr
because not all the conditional expressions that we use in higher-level programming languages can be easily expressed in JVM bytecode. -
JcRawSwitchInst
— Switch instruction. A combined representation ofLookupSwitch
andTableSwitch
bytecode instructions.
JcRawExpr
is a base interface for all the expression types and value types that can be
expressed in JVM bytecode. JcRawExpr
stores its type as a TypeName
object which is just a Java type name
represented as a string (that's why it is "raw").
Here is the list of JcRawExpr
implementations:
-
JcRawBinaryExpr
— Binary expression that allows to implement all the arithmetic expressions (e.g.JcRawAdd
,JcRawMul
, etc.), conditional expressions (JcRawEq
,JcRawGt
, etc.), logical expressions (JcRawAnd
,JcRawOr
,JcRawXor
).-
JcRawConditionExpr
— Conditional expressions. Can be used as a condition inJcRawIfInst
.
-
-
JcRawLengthExpr
— Array length expression. -
JcRawNegExpr
— Negation expression. -
JcRawCastExpr
— Cast expression. Can be used to cast both reference types and primitive types. -
JcRawNewExpr
— New expression. Creates a single object. -
JcRawNewArrayExpr
— New array expression. Creates a (multi)array of a given type. -
JcRawInstanceOfExpr
—Instanceof
check. -
JcRawCallExpr
— Method call expression.-
JcRawDynamicCallExpr
— Aninvokedynamic
instruction representation. Preserves all the info. JcRawVirtualCallExpr
JcRawInterfaceCallExpr
JcRawStaticCallExpr
JcRawSpecialCallExpr
-
-
JcRawValue
— Representation of a single value.-
JcRawSimpleValue
— Representation of a simple value that does not have any sub-values.JcRawThis
JcRawArgument
JcRawLocal
JcRawConstant
-
JcRawComplexValue
— Complex value that has subvalues.-
JcRawFieldRef
— Field reference. Can be used both as a field read access (e.g.a = x.y
) and field store access (e.g.x.y = a
). -
JcRawArrayAccess
— Array element reference. Can be used both as an array read access (e.g.a = x[y]
) and array store access (e.g.x[y] = a
)
-
-
To get a three-address instruction list representation of a method you need to call JcMethod::instructionList
.
Instruction list building requires JcClasspath
because some stages use subtyping information.
RawInstListBuilder
is used to build a JcRawInstList
from bytecode representation (i.e. from MethodNode
).
To build an instruction list representation, MethodNode
should contain frame information (i.e. the FrameNode
instances) and should not contain JSR
(jump subroutine) instructions. Thus, each time a ClassNode
is
created, we invoke ClassNode.computeFrames()
extension function that computes frame information
for each method. The computeFrames
function uses ASM functionality for computing frames: it converts
the ClassNode
back to bytecode using ClassWriter
(which actually performs frame computing during
the conversion) and reads that bytecode again using ClassReader
. This is not the most effective way of
doing things, but the easiest one: manually computing frames is quite hard.
Another thing is inlining JSR
instructions. We do this by calling MethodNode.jsrInlined
extension that
returns a new MethodNode
instance. It uses ASM JSRInlinerAdapter
utility to create a new method node with
inlined JRS instructions.
RawInstListBuilder
converts JVM bytecode instruction list into a three-address instruction list. Most of
the conversion process is simple: bytecode instructions map to three-address expressions and instructions.
The most complex part is frame merging. JVM frame describes the state of the virtual machine at each instruction, i.e.
the declared local variables and stack state. When an instruction has multiple predecessors, we need to merge several
incoming frames into one. Sometimes JVM adds a special frame node before an instruction to describe how the
frame should look like after merging.
Four situations are possible during the frame merging process.
(1) There is only one incoming frame, and it is fully defined (i.e. we have already collected all the frame information) when converting the previous stages. In that case, everything is quite simple, and we can just copy the frame info. However, we can also refine type information for local variables. Consider following bytecode:
NEW java/lang/ArrayList
ASTORE 0
...
FRAME FULL [ java/lang/List ]
...
NEW java/lang/LinkedList
ASTORE 0
When converting the first two instructions, we created a local variable %0
for cell 0 with java.lang.ArrayList
type.
However, when converting frame information, we see that JVM considers the type of cell 0 to be java.lang.List
and then uses cell 0 to store other implementations of Java List
. We can refine the type of %0
to be
java.lang.List
and then replace all the occurrences of %0
with the new version. That is performed by
using localTypeRefinement
property of RawInstListBuilder
and ExprMapper
utility class.
(2) There is only one incoming frame, and it has not been defined yet. It is a rare case, but it can happen in a situation like this:
GOTO L2
L1
FRAME FULL [ java/lang/List ]
...
GOTO L3
L2
...
GOTO L1
L3
RETURN
In this situation, we use frame info to create a new local variable with the defined type and remember to
add an assignment that will initialize a new variable with a correct value in the predecessor. It is performed in the
end by using laterAssignments
and laterStackAssignments
maps as well as the buildRequiredAssignments
function.
The process is similar for both stack variables and local variables.
(3) There are several predecessor frames, and all of them are defined (e.g. when merging after if...else
block). In
that case, we create a new local variable with the defined type in the current frame and add necessary assignments
into the predecessor blocks.
(4) There are several predecessor frames, and not all of them are defined (e.g. when merging in the loop header).In that case, we create a new local variable with the defined type in the current frame, add necessary assignments into the predecessor blocks, and remember to add the required assignments to undefined predecessor blocks in the end.
RawInstListBuilder
also simplifies the resulting instruction list. It is necessary because the list constructing
process naturally introduces a lot of redundancy.
The main simplification stages are:
- Deleting the repeated assignments inside a basic block.
- Deleting declarations of unused variables.
- Deleting declarations of mutually dependent unused variables (e.g.
a = b
andb = a
). - Simple unit propagation.
- Type normalization using
JcClasspath
.
There is a visitor API for traversing and modifying JcRawInstList
. Visitors have a standard
interface — they can be invoked using an accept
method on instructions and expressions:
val a = jcRawInst.accept(MyInstVisitor())
val b = jcRawExpr.accept(MyExprVisitor())
We also provide "functional"-like extensions for applying visitors to JcRawInstList
:
filter(visitor: JcRawInstVisitor<Boolean>): JcRawInstList
filterNot(visitor: JcRawInstVisitor<Boolean>): JcRawInstList
map(visitor: JcRawInstVisitor<JcRawInst>): JcRawInstList
mapNotNull(visitor: JcRawInstVisitor<JcRawInst?>): JcRawInstList
flatMap(visitor: JcRawInstVisitor<Collection<JcRawInst>>): JcRawInstList
apply(visitor: JcRawInstVisitor<Unit>): JcRawInstList
applyAndGet(visitor: T, getter: (T) -> R): R
collect(visitor: JcRawInstVisitor<T>): Collection<T>
jcdb-core
has a number of utility visitors for working with the instruction list:
-
ExprMapper(val mapping: Map<JcRawExpr, JcRawExpr>)
— Traverses an instruction list and replaces all expression occurrences frommapping
to the corresponding property. -
FullExprSetCollector()
— Collects all the expressions that occur in a given object (instruction list, single instruction or an expression). -
InstructionFilter(val predicate: (JcRawInst) -> Boolean)
— Filters the instructions by a given predicate.
JcRawInstList
can be converted back to the ASM MethodNode
using MethodNodeBuilder.build()
method. The conversion process is pretty straightforward and does not require any additional
comments.