Title: IDAPython vs. r2pipe
Date: 2018-09-08 12:30

This week, I'm at the [r2con 2018](https://rada.re/con/2018/),
meeting some friends, making new ones, attending great talks, …
But I'm also spending some time looking at a *particular* "real world" binary that
shall remain unnamed yet. This binary contains encrypted strings, and since I
have some *free time*, I wrote not one, but two scripts to decrypt them: one using
[r2pipe](https://github.com/radare/radare2-r2pipe),
the second using [IDAPython](https://github.com/idapython/src).

The decryption function is always called like this:

```nasm
[0x000e1944]> pd 2 @ 0x000e193d
  0x000e193d      c7042470ea10.  mov dword [esp], 0x10ea70
  0x000e1944      e8976bf5ff     call fcn.000384e0
[0x000e1944]>
```

An offset is pushed on the stack, and the decryption function (`fcn.000384e0`)
is called. The function's graph is looking like this:

```nasm
[0x000313b0]> s 0x384e0
[0x000384e0]> af
[0x000384e0]> VV

              ┌────────────────────┐
              │  0x384e0           │
              └────────────────────┘
                     │ │
                     │ └──────────────┐
          ┌─────────────────────────┐ │
          │  0x38596                │ │
          │ 0x00038596 call 0x38380 │ │
          └─────────────────────────┘ │
                     │ ┌──────────────┘
                     │ │
             ┌────────────────────┐
             │  0x384f9           │
             └────────────────────┘
                     │ │
    ┌────────────────┘ └───────┐ ┌───────────────────────┐
    │                          │ │                       │
┌────────────────────┐  ┌────────────────────┐           │
│  0x38517           │  │  0x38527           │           │
└────────────────────┘  └────────────────────┘           │
    │                          │ │                       │
    │               ┌──────────┘ └──────────┐            │
    │     ┌────────────────────┐  ┌────────────────────┐ │
    │     │  0x38520           │  │  0x3852b           │ │
    │     └────────────────────┘  └────────────────────┘ │
    │              │ │                                   │
    │ ┌────────────┘ └───────────────────────────────────┘
    │ │
┌───────────────────────────────────────┐
│ [0x38539]                             │
│ 0x00038543 call dword [reloc.malloc]  │
│ 0x00038553 call dword [reloc.malloc]  │
│ 0x00038569 call 0x37d30               │
│ 0x00038574 call 0x37cc0               │
└───────────────────────────────────────┘
```

The interesting part is `0x37cc0`, because it really looks like a xor-decryption loop:

```nasm
 [0x000e1944]> pdf @ 0x37cc0
┌ (fcn) fcn.00037cc0 43
│   fcn.00037cc0 (int arg_ch);
│           ; arg int arg_ch @ esp+0xc
│           ; CALL XREF from fcn.000384e0 (0x38574)
│           0x00037cc0      56             push esi
│           0x00037cc1      31d2           xor edx, edx
│           0x00037cc3      53             push ebx
│           0x00037cc4      8b4c240c       mov ecx, dword [arg_ch]
│           0x00037cc8      0fb601         movzx eax, byte [ecx]
│           0x00037ccb      89c6           mov esi, eax
│           0x00037ccd      8d5801         lea ebx, [eax + 1]
│       ┌─> 0x00037cd0      8d0432         lea eax, [edx + esi]
│       │   0x00037cd3      83e00f         and eax, 0xf
│       │   0x00037cd6      0fb680204010.  movzx eax, byte [eax + 0x104020]
│       │   0x00037cdd      30440a01       xor byte [edx + ecx + 1], al
│       │   0x00037ce1      83c201         add edx, 1
│       │   0x00037ce4      39da           cmp edx, ebx
│       └─< 0x00037ce6      75e8           jne 0x37cd0
│           0x00037ce8      5b             pop ebx
│           0x00037ce9      5e             pop esi
└           0x00037cea      c3             ret
```

It's looking like this via [r2dec](https://github.com/wargio/r2dec-js):

```c
[0x000e1944]> s 0x00037cc0
[0x00037cc0]> pdd
void fcn_00037cc0 () {
    edx = 0;
    ecx = *(arg_ch);
    eax = ecx;
    esi = eax;
    ebx = eax + 1;
    do {
        eax = edx + esi;
        eax &= 0xf;
        eax = eax + 0x104020;
        *(edx + ecx + 1) ^= al;
        edx += 1;
    } while (edx == ebx);
}
[0x00037cc0]>
```

So the plan is to:

1. Find every callsite for the function `fcn.000384e0`
2. Get its argument pushed on the stack
3. Emulate the decryption routine in Python

# IDA Python

```python
import idautils
import idc
import idaapi

table = idaapi.get_many_bytes(0x00104020, 255)
decrypt_str_addr = idc.get_name_ea_simple("decrypt_string")

for addr in idautils.CodeRefsTo(decrypt_str_addr, 0):
    arg_addr = idaapi.get_arg_addrs(addr)
    if arg_addr is None:
        continue

    print hex(addr), idc.generate_disasm_line(addr, 0), hex(arg_addr[0])

    ea = idaapi.get_fileregion_ea(arg_addr[0])
    data_addr = idc.GetOperandValue(ea, 1)

    key = idaapi.get_byte(data_addr)
    b = idaapi.get_many_bytes(data_addr, 256)

    out = ""
    for i in range(key):
        ret = ord(table[(i + key) & 0xf])
        out += chr(ret ^ ord(b[i + 1]))
    print(out)
```

# r2pipe

```python
import r2pipe

def get_previous_mov_esp(r, offset):
    """ Since instructions aren't aligned in x86
    and radare2's analysis is often "suboptimal",
    so we're simply bruteforcing the offset until
    we find a good looking™ instruction.
    """
    for i in range(1, 20):
        opcodes = r.cmdj("pdj -%d @%s" % (i, offset))
        for opcode in reversed(opcodes):
            if opcode['opcode'].startswith("mov dword [esp], 0x"):
                return opcode
    print("Error at %s" % offset)


def main():
    r = r2pipe.open('my_bin.so')
    table = r.cmdj('pxj 256 @ 0x00104020')  # read the decryption table

    # Those two commands are only cosmetic
    r.cmd('s 0x000384e0')  # seek to the decryption function
    r.cmd('af')  # create a function at 0x000384e0

    # The `/r` command is to search (`/` like in vim) for _r_eferences
    # The `$$` variable contains the current offset
    # `~[1]` is a filter to get the second column of the output
    for ref in r.cmd('/r $$~[1]').split('\n')  # `/r` doesn't support json yet™
        argument = get_previous_mov_esp(r, ref)
        if argument is None:
            continue
        offset = argument['val']
        print("Offset %s for call at %s" % (hex(offset), ref))
        data = r.cmdj('pxj 256 @ %s' % offset)  # read what's at the offset

        out = ""
        for i in range(data[0]):
            ret = table[(i + data[0]) & 0xf]
            out += chr(ret ^ data[i + 1])
        print(out)

main()
```

# Comparison

`IDAPython` is a Python2.7 wrapper on top of [IDA script](https://www.hex-rays.com/products/ida/support/tutorials/idc/index.shtml).
While its API is known to be awkward (juggling between *CamelCase* and *snake_case*
for everything, its "sometimes you need to pass a context but
sometimes you dont" approach, the "[epydoc](https://www.hex-rays.com/products/ida/support/idapython_docs/)
with most of the functions without description is enough" motto),
there are countless examples floating on the internet on how to use
it for everything.

`r2pipe` is a magical pipe where you throw r2 commands, and results come out.
It's well known that radare2 commands *might* be "a bit" daunting, but since they
are all recursively self-documented with `?*`, it's just a matter of
bruteforcing keywords, like `?*~references` to find the right commands.
Worst case, if you don't want to learn some r2-fu, you can always use
[r2pipe-api](https://github.com/radare/radare2-r2pipe-api/tree/master/python),
that has a more conventional programming interface, with things like `r.at('sym.imp.setenv').disasm(16)`.

The hardest part in my opinion was to deal with the absence of thorough analysis,
like not being able to ask radare2 what is the value of a functions' first
argument.

The two script are both instant, and took the same time to be written.
The radare one finds 1330 decrypted strings,
while IDA finds 1360. The difference is likely because of IDA's ability
to propagate values through the control-flow for constructs like this one,
where my ghetto-wannabe-analysis-by-bruteforce doesn't.

```nasm
[0x000d6e6a]> pd 3 @ 0x000d6e62
            0x000d6e62      a180c81000     mov eax, dword [0x10c880]
            0x000d6e67      890424         mov dword [esp], eax
            0x000d6e6a      e87116f6ff     call 0x384e0
[0x000d6e6a]> 
```
