|| Date: 18-06-10 || Back to index ||
|| Tag: write-up ||

ASM Primer - Part 1

This goal of this article is to collect important info about Assembly and basic concepts of reversing. The points are scattered but I’ll try to keep a cohesive structure.

An important note here is that I’ll be using Intel syntax, so in MOV RBX, RAX, the first operand is RAX.

Purposes of Registers

x86 has 8 registers that one can use

The old 32-bit registers have been extended to 64 bits, the r registers (RAX, RBX, RSP and so on).

In addition, there’s some extra general purpose registers R8 through R15 which can also be accessed as (for example) R8D, R8W and R8B (the lower 32-bit double-word, 16-bit word and 8-bit byte respectively).

The high byte of the old 16-bit registers is still accessible, under many circumstances, as AH, BH, and so on, but this appears to not be the case for the R8 through R15 registers.

Segment Registers

Most applications on most modern operating systems (like FreeBSD, Linux or Microsoft Windows) use a memory model that points nearly all segment registers to the same place, effectively disabling their use. Typically the use of FS or GS is an exception to this rule, instead being used to point at thread-specific data.

Fun fact, FS:[0] is a pointer to the root node in a structure exception handler linked list.

Also, the PEB (Process Environment Block) is at FS:[30]. If its dereferenced, it could be an anti-debugging measurement since it does contain info about the current process. Here’s a good reference to what the PEB structure contains

Context

Context is very important when looking at assembly code. Take a lookie here:

PUSH 0x656c6c6f
"hello"
448378203247

All the above lines have the same data associated with them. The determination of whether information is code or data depends on the context.

Indirect Memory Access

Compilers would usually use EBP/RBP to indirectly reference memory blocks. Here’s an example:

Branching and Equality

CMP instruction is an implied SUB. A side effect of SUB EAX, 8 is that it will test if EAX is equal to 8 and the result will be saved in a flag. Unfortunately, it will also modify EAX which we don’t want

TEST is an implied AND. Its used a lot to test if an operation returned zero. A common pattern to observe would be as such.

call        ds:HTTPSendRequestA ;Return 0 if failed
test        eax, eax
jz   short  loc_xxxx   
; ----------------------
; Success path
; ----------------------
; ...
; ...
; ...
; loc_xxxx:
; ----------------------
; fail path
; ----------------------

Jump Above and below are what’s called ‘Unsigned Checks’. Below is an example

mov rax, -1 // rax == -1
cmp rbx, rax
ja loc

The above code will always jump (except if rbx was -1 too) since ja performs an unsigned operation, so -1 is interpreted as the highest possible unsigned number. ja checks the first operand of cmp (rax) and if it is above the second operand (rbx), it jumps.

A Look at the Stack

; Stack grows downwards

|OOOO| 0x00000000 LOW MEMORY
|OOOO| 
|OOOO| 
|OOOO| 
|----| <- ESP 
|loc3| <- EBP - 0x10
|----|
|loc2| <- EBP - 0xC
|----| 
|loc1| <- EBP - 0x8
|----| 
|loc0| <- EBP - 0x4
|----| <- EBP 
|SFP | 
|----|
|RET | 
|----| 
|arg0| <- EBP + 0x8
|----| 
|arg1| <- EBP + 0xC
|----| 
|arg2| <- EBP + 0x10
|----| 
|OOOO| 
|OOOO| 
|OOOO| 
|OOOO| 
|OOOO| 
|OOOO| 0xFFFFFFFF HIGH MEMORY

evil_func:
    PUSH EBP
    MOV EBP, ESP
    SUB ESP, 0x10
    PUSH ESI ; Push to make space for a local variable
    PUSH EDI ; Push to make space for a local variable
    PUSH EBX ; Push to make space for a local variable
    PUSH ECX ; Push to make space for a local variable
    ... ; < The diagram above stops here >
    ...
    ...
    MOV ESP, EBP
    POP EBP
    ret

main_func:
    ...
    ...
    PUSH arg2
    PUSH arg1
    PUSH arg0
    CALL evil_func
    ADD ESP, 0x10 ; cdecl cleanup

Function Calling Conventions

Apparently there’s a lot of function call conventions, each one has a different way of handling things like cleaning up the stack and how to pass arguments to them. Below is a very non-extensive list:

cdecl

An example of this is here

evil_func:
    PUSH EBP
    MOV EBP, ESP
    SUB ESP, 0x10
    PUSH ESI ; Push to make space for a local variable
    PUSH EDI ; Push to make space for a local variable
    PUSH EBX ; Push to make space for a local variable
    PUSH ECX ; Push to make space for a local variable
    ...
    ...
    ...
    MOV ESP, EBP 
    POP EBP ; Basically popping the local variables from the stack
    ret

main_func:
    ...
    ...
    PUSH arg2
    PUSH arg1
    PUSH arg0
    CALL evil_func
    ADD ESP, 0x10 ; cdecl cleanup

There’s a concept in reversing called Stack Neutralization. It refers to the fact that the stack has to become ‘neutral’ before transfering control to another location (e.g. perform a JMP). An example to this can be understood when dealing with packer malware. Before the wrapper malware can transfer control to the unpacked binary, the wrapper has to ‘neutralize the stack’ as in remove any allocated local variables and/or arguments from the stack and return the location of the base pointer to where it should be. Knowledge of calling conventions is necessary to search and understand what’s happening with the stack so that one can trace the state of the stack in an effort to find the unpacked malware.

There’s an important note to make here. Many times the compiler will shortcut the local variable cleanup into one LEAVE instruction. So the instructions mov ESP, EBP and then pop EBP that comprise the function epilogue replaced with a simple leave.

On Function Prologues & Epilogues

Take a look at the function below

.text:0040611C ; int __cdecl sub_40611C(LPWSTR lpCommandLine, int, int, int, int, int)
.text:0040611C sub_40611C      proc near               
.text:0040611C                                        
.text:0040611C
.text:0040611C StartupInfo     = _STARTUPINFOW ptr -54h
.text:0040611C ProcessInformation= _PROCESS_INFORMATION ptr -10h
.text:0040611C lpCommandLine   = dword ptr  8
.text:0040611C arg_14          = dword ptr  1Ch
.text:0040611C
.text:0040611C                 push    ebp
.text:0040611D                 mov     ebp, esp
.text:0040611F                 sub     esp, 54h
.text:00406122                 push    ebx
.text:00406123                 push    esi
.text:00406124                 push    edi
.text:00406125                 push    40h
.text:00406127                 xor     ebx, ebx
.text:00406129                 lea     eax, [ebp+StartupInfo.lpReserved]
.text:0040612C                 push    ebx
.text:0040612D                 push    eax
.text:0040612E                 mov     [ebp+StartupInfo.cb], 44h
.text:00406135                 call    sub_40B160
.text:0040613A                 xor     eax, eax
.text:0040613C                 xor     edi, edi
.text:0040613E                 inc     edi
.text:0040613F                 add     esp, 0Ch
.text:00406142                 cmp     [ebp+arg_14], 8
.text:00406146                 mov     [ebp+StartupInfo.wShowWindow], ax
.text:0040614A                 mov     eax, [ebp+lpCommandLine]
.text:0040614D                 mov     [ebp+StartupInfo.dwFlags], edi
.text:00406150                 jnb     short loc_406155
.text:00406152                 lea     eax, [ebp+lpCommandLine]
.text:00406155
.text:00406155 loc_406155:                             ; CODE XREF: sub_40611C+34↑j
.text:00406155                 lea     ecx, [ebp+ProcessInformation]
.text:00406158                 push    ecx             ; lpProcessInformation
.text:00406159                 lea     ecx, [ebp+StartupInfo]
.text:0040615C                 push    ecx             ; lpStartupInfo
.text:0040615D                 push    ebx             ; lpCurrentDirectory
.text:0040615E                 push    ebx             ; lpEnvironment
.text:0040615F                 push    50h             ; dwCreationFlags
.text:00406161                 push    ebx             ; bInheritHandles
.text:00406162                 push    ebx             ; lpThreadAttributes
.text:00406163                 push    ebx             ; lpProcessAttributes
.text:00406164                 push    eax             ; lpCommandLine
.text:00406165                 push    ebx             ; lpApplicationName
.text:00406166                 call    ds:CreateProcessW
.text:0040616C                 test    eax, eax
.text:0040616E                 jnz     short loc_406182
.text:00406170
.text:00406170 loc_406170:                             ; CODE XREF: sub_40611C+78↓j
.text:00406170                 push    edi
.text:00406171                 xor     edi, edi
.text:00406173                 lea     esi, [ebp+lpCommandLine]
.text:00406176                 call    sub_402D33
.text:0040617B                 pop     edi
.text:0040617C                 pop     esi
.text:0040617D                 mov     al, bl
.text:0040617F                 pop     ebx
.text:00406180                 leave
.text:00406181                 retn

How many arguments does it have? IDA got confused and mentioned two in the metadata and six in the function declaration. We’ll have to look at the calling sites and stack cleanup to know how this really is. One call site looks like this:

.text:004047B2                 mov     [ebp-30h], esp
.text:004047B5                 push    offset aVssadminExeDel ; "vssadmin.exe Delete Shadows /All /Quiet"
.text:004047BA                 call    sub_402E75
.text:004047BF                 call    sub_40611C
.text:004047C4                 add     esp, 1Ch

There’s one push instruction and a mov to a location on the stack. Also, the cleanup is adding 1Ch to the stack pointer. 1Ch happens also to be the size of the 2nd argument reported by IDA in line .text:0040611C.

So what’s the function prologue? Well, it should start from .text:0040611C and ends in .text:00406124. A keen reader would notice we included a bunch of operations to save register values. Technically, these are NOT local variables; the compiler is simply saving registers that will be used during this function call.

Same goes for the prologue, which starts from .text:0040611C and ends with .text:00406181. The POP instructions at the end simply pop the used up registers. The LEAVE instruction is the one reponsible for popping out the two local variables we have.

stdcall

Example

evil_func:
    PUSH EBP
    MOV EBP, ESP
    SUB ESP, 0x10
    PUSH ESI ; Push to make space for a local variable
    PUSH EDI ; Push to make space for a local variable
    PUSH EBX ; Push to make space for a local variable
    PUSH ECX ; Push to make space for a local variable
    ...
    ...
    ...
    MOV ESP, EBP
    POP EBP
    ADD ESP, 0x10 ; stdcall cleanup
    ret

main_func:
    ...
    ...
    PUSH arg2
    PUSH arg1
    PUSH arg0
    CALL evil_func
    MOV dword_ptr [c], EAX ; Move the return value of evil_func to local variable [c]

fastcall

Example

evil_func:
    ; function prolog
    push        ebp  
    mov         ebp,esp 
    sub         esp,0D8h
    push        ebx  
    push        esi  
    push        edi  
    push        ecx  
    lea         edi,[ebp-0D8h] 
    mov         ecx,36h 
    mov         eax,0CCCCCCCCh 
    rep stos    dword ptr [edi] 
    pop         ecx  
    mov         dword ptr [ebp-14h],edx 
    mov         dword ptr [ebp-8],ecx 

    ; return a + b;
    mov         eax,dword ptr [a] 
    add         eax,dword ptr [b] 

    ; function epilog  
    pop         edi  
    pop         esi  
    pop         ebx  
    mov         esp,ebp 
    pop         ebp  
    ret         0x8

main_func:
    ; put the arguments in the registers EDX and ECX
    mov EDX,3 
    mov ECX,2 

    ; call the function
    call evil_func

    ; copy the return value from EAX to a local variable (int c)  
    mov dword ptr [c],EAX

You might notice here that there isn’t a call to ADD ESP, 0x8 or something like that in evil_func that would handle cleaning up the arguments. This is taken care of by ret 8 which effictively returns the function and cleans up 0x8 from the stack. Same case occurs in thiscall convention and stdcall conventions.

thiscall

; Compiled with a Microsoft compiler

main_func:
    push        3
    push        2
    lea         ecx,[sumObj]
    call        evil_func@bunnyFooFoo ; bunnyFooFoo::sum
    mov         dword ptr [s4],eax

evil_func:
    ; function prologue
    push        ebp
    mov         ebp,esp
    sub         esp,0CCh
    push        ebx
    push        esi
    push        edi
    push        ecx
    lea         edi,[ebp-0CCh]
    mov         ecx,33h
    mov         eax,0CCCCCCCCh
    rep stos    dword ptr [edi]
    pop         ecx
    mov         dword ptr [ebp-8],ecx

    ; return a + b
    mov         eax,dword ptr [a]
    add         eax,dword ptr [b]

    ; function epilogue
    pop         edi
    pop         esi
    pop         ebx
    mov         esp,ebp
    pop         ebp
    ret         0x8

Next article will cover control flow statements