This week, I'm at the r2con 2018, meeting some friends, making new ones, attending great talks, … But I'm also spending some time looking at a particular "real world" binary that shall remain unnamed yet. This binary contains encrypted strings, and since I have some free time, I wrote not one, but two scripts to decrypt them: one using r2pipe, the second using IDAPython.
The decryption function is always called like this:
[0x000e1944]> pd 2 @ 0x000e193d
0x000e193d c7042470ea10. mov dword [esp], 0x10ea70
0x000e1944 e8976bf5ff call fcn.000384e0
[0x000e1944]>
An offset is pushed on the stack, and the decryption function (fcn.000384e0)
is called. The function's graph is looking like this:
[0x000313b0]> s 0x384e0
[0x000384e0]> af
[0x000384e0]> VV
┌────────────────────┐
│ 0x384e0 │
└────────────────────┘
│ │
│ └──────────────┐
┌─────────────────────────┐ │
│ 0x38596 │ │
│ 0x00038596 call 0x38380 │ │
└─────────────────────────┘ │
│ ┌──────────────┘
│ │
┌────────────────────┐
│ 0x384f9 │
└────────────────────┘
│ │
┌────────────────┘ └───────┐ ┌───────────────────────┐
│ │ │ │
┌────────────────────┐ ┌────────────────────┐ │
│ 0x38517 │ │ 0x38527 │ │
└────────────────────┘ └────────────────────┘ │
│ │ │ │
│ ┌──────────┘ └──────────┐ │
│ ┌────────────────────┐ ┌────────────────────┐ │
│ │ 0x38520 │ │ 0x3852b │ │
│ └────────────────────┘ └────────────────────┘ │
│ │ │ │
│ ┌────────────┘ └───────────────────────────────────┘
│ │
┌───────────────────────────────────────┐
│ [0x38539] │
│ 0x00038543 call dword [reloc.malloc] │
│ 0x00038553 call dword [reloc.malloc] │
│ 0x00038569 call 0x37d30 │
│ 0x00038574 call 0x37cc0 │
└───────────────────────────────────────┘
The interesting part is 0x37cc0, because it really looks like a xor-decryption loop:
[0x000e1944]> pdf @ 0x37cc0
┌ (fcn) fcn.00037cc0 43
│ fcn.00037cc0 (int arg_ch);
│ ; arg int arg_ch @ esp+0xc
│ ; CALL XREF from fcn.000384e0 (0x38574)
│ 0x00037cc0 56 push esi
│ 0x00037cc1 31d2 xor edx, edx
│ 0x00037cc3 53 push ebx
│ 0x00037cc4 8b4c240c mov ecx, dword [arg_ch]
│ 0x00037cc8 0fb601 movzx eax, byte [ecx]
│ 0x00037ccb 89c6 mov esi, eax
│ 0x00037ccd 8d5801 lea ebx, [eax + 1]
│ ┌─> 0x00037cd0 8d0432 lea eax, [edx + esi]
│ │ 0x00037cd3 83e00f and eax, 0xf
│ │ 0x00037cd6 0fb680204010. movzx eax, byte [eax + 0x104020]
│ │ 0x00037cdd 30440a01 xor byte [edx + ecx + 1], al
│ │ 0x00037ce1 83c201 add edx, 1
│ │ 0x00037ce4 39da cmp edx, ebx
│ └─< 0x00037ce6 75e8 jne 0x37cd0
│ 0x00037ce8 5b pop ebx
│ 0x00037ce9 5e pop esi
└ 0x00037cea c3 ret
It's looking like this via r2dec:
[0x000e1944]> s 0x00037cc0
[0x00037cc0]> pdd
void fcn_00037cc0 () {
edx = 0;
ecx = *(arg_ch);
eax = ecx;
esi = eax;
ebx = eax + 1;
do {
eax = edx + esi;
eax &= 0xf;
eax = eax + 0x104020;
*(edx + ecx + 1) ^= al;
edx += 1;
} while (edx == ebx);
}
[0x00037cc0]>
So the plan is to:
- Find every callsite for the function
fcn.000384e0 - Get its argument pushed on the stack
- Emulate the decryption routine in Python
IDA Python
import idautils
import idc
import idaapi
table = idaapi.get_many_bytes(0x00104020, 255)
decrypt_str_addr = idc.get_name_ea_simple("decrypt_string")
for addr in idautils.CodeRefsTo(decrypt_str_addr, 0):
arg_addr = idaapi.get_arg_addrs(addr)
if arg_addr is None:
continue
print hex(addr), idc.generate_disasm_line(addr, 0), hex(arg_addr[0])
ea = idaapi.get_fileregion_ea(arg_addr[0])
data_addr = idc.GetOperandValue(ea, 1)
key = idaapi.get_byte(data_addr)
b = idaapi.get_many_bytes(data_addr, 256)
out = ""
for i in range(key):
ret = ord(table[(i + key) & 0xf])
out += chr(ret ^ ord(b[i + 1]))
print(out)
r2pipe
import r2pipe
def get_previous_mov_esp(r, offset):
""" Since instructions aren't aligned in x86
and radare2's analysis is often "suboptimal",
so we're simply bruteforcing the offset until
we find a good looking™ instruction.
"""
for i in range(1, 20):
opcodes = r.cmdj("pdj -%d @%s" % (i, offset))
for opcode in reversed(opcodes):
if opcode['opcode'].startswith("mov dword [esp], 0x"):
return opcode
print("Error at %s" % offset)
def main():
r = r2pipe.open('my_bin.so')
table = r.cmdj('pxj 256 @ 0x00104020') # read the decryption table
# Those two commands are only cosmetic
r.cmd('s 0x000384e0') # seek to the decryption function
r.cmd('af') # create a function at 0x000384e0
# The `/r` command is to search (`/` like in vim) for _r_eferences
# The `$$` variable contains the current offset
# `~[1]` is a filter to get the second column of the output
for ref in r.cmd('/r $$~[1]').split('\n') # `/r` doesn't support json yet™
argument = get_previous_mov_esp(r, ref)
if argument is None:
continue
offset = argument['val']
print("Offset %s for call at %s" % (hex(offset), ref))
data = r.cmdj('pxj 256 @ %s' % offset) # read what's at the offset
out = ""
for i in range(data[0]):
ret = table[(i + data[0]) & 0xf]
out += chr(ret ^ data[i + 1])
print(out)
main()
Comparison
IDAPython is a Python2.7 wrapper on top of IDA script.
While its API is known to be awkward (juggling between CamelCase and snake_case
for everything, its "sometimes you need to pass a context but
sometimes you dont" approach, the "epydoc
with most of the functions without description is enough" motto),
there are countless examples floating on the internet on how to use
it for everything.
r2pipe is a magical pipe where you throw r2 commands, and results come out.
It's well known that radare2 commands might be "a bit" daunting, but since they
are all recursively self-documented with ?*, it's just a matter of
bruteforcing keywords, like ?*~references to find the right commands.
Worst case, if you don't want to learn some r2-fu, you can always use
r2pipe-api,
that has a more conventional programming interface, with things like r.at('sym.imp.setenv').disasm(16).
The hardest part in my opinion was to deal with the absence of thorough analysis, like not being able to ask radare2 what is the value of a functions' first argument.
The two script are both instant, and took the same time to be written. The radare one finds 1330 decrypted strings, while IDA finds 1360. The difference is likely because of IDA's ability to propagate values through the control-flow for constructs like this one, where my ghetto-wannabe-analysis-by-bruteforce doesn't.
[0x000d6e6a]> pd 3 @ 0x000d6e62
0x000d6e62 a180c81000 mov eax, dword [0x10c880]
0x000d6e67 890424 mov dword [esp], eax
0x000d6e6a e87116f6ff call 0x384e0
[0x000d6e6a]>