恶意 PDF 分析

信息安全 恶意软件 pdf
2021-08-28 03:09:35

我一直在分析PDF我怀疑包含恶意内容的内容。在大多数情况下,我一直信任自动化工具来确定 PDF 是否可以安全打开。然而,我的眼睛已经打开了今天在野外的所有加密和混淆技术因此,我开始使用这些工具和PDFStreamDumper手动查看我的 PDF 。我查看了位于此处的 PDF 规范。

在我所见之处,似乎没有人解释这些/<Abbreviation>指令的目的。例如,在标题中找到的摘录。

我找不到什么/JT参考资料。或者为什么/GoTo不指定位置。

第二个对象指定/Cn/V但我也找不到这些。

第三个对象/Dt/JTM, 在 PDF 规范中没有引用。有人可以给我一些方向。我愿意进行研究,但我不确定除了对象中包含的缩写命令之外我在看什么。是否有列出这些指令及其用途的备忘单?

标题

<<

    /JT 2 0 R
    /OpenAction 
    <<

        /D [ 9 0 R /Fit ]
        /S /GoTo

    >>

    /Outlines 8683 0 R
    /PageLabels 8875 0 R
    /PageLayout /SinglePage
    /PageMode /UseOutlines
    /Pages 5437 0 R
    /Type /Catalog
>>

第二个对象

<<

    /A [ 3 0 R ]
    /Cn [ 4 0 R ]
    /V 1.1
>>

第三个对象

<<

    /Dt (D:20101223094432)
    /JTM (Distiller)
>>

注意:我确实通过Virus Total运行了该文件,但出现了一些危险信号。pdf 符合 1.7 规范。

1个回答

此链接适用于支持 PDF 的开发人员工具: http: //www.adobe.com/devnet/pdf.html 具体来说,DarkLighting 提到的 1.7 参考是: http ://wwwimages.adobe.com/content/dam/Adobe /en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf

该文件的第 3.2.4 节似乎解决了您的问题:

   3.2.4Name Objects
   A name object is an atomic symbol uniquely defined by a sequence of characters.
Uniquely defined means that any two name objects made up of the same sequence of characters 
are identically the same object. Atomic means that a name has no internal structure;
although it is defined by a sequence of characters, those characters are not considered 
elements of the name.

        A slash character (/) introduces a name. The slash is not part of the name but is 
a prefix indicating that the following sequence of characters constitutes a name. 
There can be no white-space characters between the slash and the first character in 
the name. The name may include any regular characters, but not delimiter or white-space 
characters (see Section 3.1, “Lexical Conventions”). Uppercase and lowercase letters are 
considered distinct: /A and /a are different names. The following examples are valid 
literal names:
    /Name1
    /ASomewhatLongerName
    /A;Name_With−Various***Characters?
    /1 . 2
    /$$
    /@pattern
    /. notdef

因此,/JT /Cn /V 等似乎是 PDF 字典对象中的命名对象(由双尖括号 << ... >> 标识)。在您的示例中,所有这些“未识别”元素都包含在字典对象中。有关此元素的更详细说明,请参见第 3.2.6 节。

也可以想象,这些是 2.2.8 中描述的 PDF 可扩展性选项的一部分:

Additionally, PDF provides means for applications to store their own private 
information in a PDF file. This information can be recovered when the file is 
imported by the same application, but it is ignored by other applications. 
Therefore, PDF can serve as an application’s native file format while its 
documents can be viewed and printed by other applications. Application-specific 
data can be stored either as marked content annotating the graphics objects in 
a PDF content stream or as entirely separate objects unconnected with the PDF content.

基本上,如果不逐个检查并对其进行解码(通过自行开发的自动化工具或手动),很难说出所有各种非标准对象的定义。

关于 /GoTo 评论,我不同意 DarkLighting。PDF 渲染器在采取任何操作之前阅读字典的全部内容。PDF 规范没有说明顺序很重要——只是声明了“ /S /GoTo ”和“ /D < [some kinda destination] >”。在您的示例中,它说转到第 9 页,位置 0。