逆向工程 Perl 兼容的正则表达式

逆向工程 恶意软件 二元分析 记忆 结构
2021-06-15 09:48:08

我正在处理一个广泛使用 PCRE(Perl 兼容的正则表达式)的恶意软件。通常我可以读取它们,但它们似乎是某种二进制格式(编译,也许?)。它们都以ERCP(查看下面的十六进制转储)开始;FWIW,我强烈怀疑生成此代码的语言是 C++。

00000150  00 00 00 00 11 00 5e 00  00 00 01 00 00 00 45 52  |......^.......ER|
00000160  43 50 56 00 00 00 00 00  80 00 04 00 00 00 01 00  |CPV.............|
00000170  00 00 00 00 74 00 28 00  00 00 00 00 00 00 00 00  |....t.(.........|
00000180  00 00 00 00 00 00 5e 00  2a 5f 00 06 00 01 1a 54  |......^.*_.....T|
00000190  00 05 1c 2e 55 00 0b 1c  61 1c 61 1c 61 1c 61 1c  |....U...a.a.a.a.|
000001a0  61 1c 61 1c 61 1c 61 1c  2e 1c 6e 1c 65 1c 74 1b  |a.a.a.a...n.e.t.|
000001b0  55 00 2a 00 00 00 00 00  8d ff a5 95 0a 2d 2d 2d  |U.*..........---|

在此示例中,正则表达式似乎与某个与 Internet 域aaaaaaaa.net相关的字符串匹配

我的问题是:给定一个这样的二进制 blob,是否有可能回到“人类可读”(反编译?)PCRE? ^aaaaaa\.net$)如果是,我应该怎么做?

2个回答

谷歌搜索0x50435245给出了几个点击,例如这里

/* Magic number to provide a small check against being handed junk. Also used
to detect whether a pattern was compiled on a host of different endianness. */

#define MAGIC_NUMBER  0x50435245UL   /* 'PCRE' */

<...snip...>

/* The real format of the start of the pcre block; the index of names and the
code vector run on as long as necessary after the end. We store an explicit
offset to the name table so that if a regex is compiled on one host, saved, and
then run on another where the size of pointers is different, all might still
be well. For the case of compiled-on-4 and run-on-8, we include an extra
pointer that is always NULL. For future-proofing, a few dummy fields were
originally included - even though you can never get this planning right - but
there is only one left now.

NOTE NOTE NOTE:
Because people can now save and re-use compiled patterns, any additions to this
structure should be made at the end, and something earlier (e.g. a new
flag in the options or one of the dummy fields) should indicate that the new
fields are present. Currently PCRE always sets the dummy fields to zero.
NOTE NOTE NOTE:
*/

typedef struct real_pcre {
  pcre_uint32 magic_number;
  pcre_uint32 size;               /* Total that was malloced */
  pcre_uint32 options;
  pcre_uint32 dummy1;             /* For future use, maybe */

  pcre_uint16 top_bracket;
  pcre_uint16 top_backref;
  pcre_uint16 first_byte;
  pcre_uint16 req_byte;
  pcre_uint16 name_table_offset;  /* Offset to name table that follows */
  pcre_uint16 name_entry_size;    /* Size of any name items */
  pcre_uint16 name_count;         /* Number of name items */
  pcre_uint16 ref_count;          /* Reference count */

  const unsigned char *tables;    /* Pointer to tables or NULL for std */
  const unsigned char *nullpad;   /* NULL padding */
} real_pcre;

以下是它如何查找您的转储:

  dd 'PCRE'     ; magic_number
  dd 56h        ; size
  dd 800000h    ; options
  dd 4          ; dummy1
  dw 1          ; top_bracket
  dw 0          ; top_backref
  dw 0          ; first_byte
  dw 74h        ; req_byte
  dw 28h        ; name_table_offset
  dw 0          ; name_entry_size
  dw 0          ; name_count
  dw 0          ; ref_count
  dd 0          ; tables
  dd 0          ; nullpad

您可能需要阅读库源代码和/或尝试用它编译一些正则表达式以解码其余部分。

这看起来像一个real_pcre结构,其格式在网上的许多其他地方都在这里定义