-
-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bytes hash and functions hash are too often the same hash in ARM #143
Comments
Hello, Let me elaborate more on this. It's good to have this here for reference purposes :) In diaphora_ida.py one can see the following: decoded_size, ins = diaphora_decode(x)
if ins.Operands[0].type in [o_mem, o_imm, o_far, o_near, o_displ]:
decoded_size -= ins.Operands[0].offb
if ins.Operands[1].type in [o_mem, o_imm, o_far, o_near, o_displ]:
decoded_size -= ins.Operands[1].offb
if decoded_size <= 0:
decoded_size = 1
...
curr_bytes = GetManyBytes(x, decoded_size, False) What happens here is that you remove operand bytes from the instructions and only use the opcode and prefixes to compute a signature, which you name Let's have a look at two examples. The following shows information exported from an ARM binary:
While the following from an IA-32 binary.
So in my ARM binary's Diaphora database, only 3845 functions have a if decoded_size <= 0:
decoded_size = 1 This tiny bug was verified using a simple IDA Python script like the following. import idc
import idaapi
import idautils
TYPES = [
idaapi.o_mem,
idaapi.o_imm,
idaapi.o_far,
idaapi.o_near,
idaapi.o_displ
]
for segment in idautils.Segments():
functions = idautils.Functions(idc.SegStart(segment), idc.SegEnd(segment))
for function in functions:
function = idaapi.get_func(function)
for head in idautils.Heads(function.startEA, function.endEA):
size = idaapi.decode_insn(head)
if size == 0:
print 'No instruction %#x' % head
if idaapi.cmd.Operands[0].type in TYPES:
if idaapi.cmd.Operands[0].offb != 0:
print '%#x 0 %#x' (idaapi.cmd.ea, idaapi.cmd.Operands[0].offb)
if idaapi.cmd.Operands[1].type in TYPES:
if idaapi.cmd.Operands[1].offb != 0:
print '%#x 1 %#x' (idaapi.cmd.ea, idaapi.cmd.Operands[1].offb) Here's a quick solution that can give similar results. Instead of relying on the instruction bytes, you can directly use information provided by the insn = idautils.DecodeInstruction(head)
itype = insn.itype
for i in xrange(6):
op_type = getattr(insn, 'Op%d' % (i + 1)).type
itype <<= 8
itype |= op_type |
Had similar issues with PPC and Tricore. One my branch I added specific
OpCode masking. Not scalable but it worked and is the only solution that I
can think of.
D
…On Fri, Jan 11, 2019, 7:42 AM Chariton Karamitas ***@***.*** wrote:
Hello,
Let me elaborate more on this. It's good to have this here for reference
purposes :)
In *diaphora_ida.py* one can see the following:
decoded_size, ins = diaphora_decode(x)if ins.Operands[0].type in [o_mem, o_imm, o_far, o_near, o_displ]:
decoded_size -= ins.Operands[0].offbif ins.Operands[1].type in [o_mem, o_imm, o_far, o_near, o_displ]:
decoded_size -= ins.Operands[1].offbif decoded_size <= 0:
decoded_size = 1...
curr_bytes = GetManyBytes(x, decoded_size, False)
What happens here is that you remove operand bytes from the instructions
and only use the opcode and prefixes to compute a signature, which you name
function_hash. Another type of signature, named bytes_hash, takes into
account all instruction bytes. So, normally, function_hash and bytes_hash
should be different. This works fine for X86, but I've noticed that, on
ARM, offb is always 0 (makes sense as operand encoding is interleaved
with opcode encoding). In this case bytes_hash and function_hash are,
most of the times, equal!
Let's have a look at two examples.
The following shows information exported from an ARM binary:
sqlite> SELECT COUNT(*) FROM functions WHERE bytes_hash != function_hash;
3845
sqlite> SELECT COUNT(*) FROM functions;
18424
While the following from an IA-32 binary.
sqlite> SELECT COUNT(*) FROM functions WHERE bytes_hash != function_hash;
20877
sqlite> SELECT COUNT(*) FROM functions;
21034
So in my ARM binary's Diaphora database, only 3845 functions have a
bytes_hash which is different from function_hash, as opposed to the IA-32
binary where most of the functions have different bytes_hash and
function_hash values. After some investigation, turned out that all of
the 3845 functions have data elements (e.g. constants, jump tables etc.)
interleaved with their instructions! I believe it's the following
"fallback" code that eventually reads a single byte from data heads
interleaved with standard function instruction heads, but haven't verified:
if decoded_size <= 0:
decoded_size = 1
This tiny bug was verified using a simple IDA Python script like the
following.
import idcimport idaapiimport idautils
TYPES = [
idaapi.o_mem,
idaapi.o_imm,
idaapi.o_far,
idaapi.o_near,
idaapi.o_displ
]
for segment in idautils.Segments():
functions = idautils.Functions(idc.SegStart(segment), idc.SegEnd(segment))
for function in functions:
function = idaapi.get_func(function)
for head in idautils.Heads(function.startEA, function.endEA):
size = idaapi.decode_insn(head)
if size == 0:
print 'No instruction %#x' % head
if idaapi.cmd.Operands[0].type in TYPES:
if idaapi.cmd.Operands[0].offb != 0:
print '%#x 0 %#x' (idaapi.cmd.ea, idaapi.cmd.Operands[0].offb)
if idaapi.cmd.Operands[1].type in TYPES:
if idaapi.cmd.Operands[1].offb != 0:
print '%#x 1 %#x' (idaapi.cmd.ea, idaapi.cmd.Operands[1].offb)
Here's a quick solution that can give similar results. Instead of relying
on the instruction bytes, you can directly use information provided by the
DecodeInstruction() API.
insn = idautils.DecodeInstruction(head)
itype = insn.itypefor i in xrange(6):
op_type = getattr(insn, 'Op%d' % (i + 1)).type
itype <<= 8
itype |= op_type
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#143 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFIEb4ey6lCtUZVulLgk-71SOn_RkP3hks5vCKLUgaJpZM4Zfasd>
.
|
Reported by Huku.
The text was updated successfully, but these errors were encountered: