逆向工程 - 了解 Linux 下最新的堆实现 - 吾爱随笔录

了解 Linux 下最新的堆实现

逆向工程 linux 开发缓冲区溢出软件安全

2021-06-27 04:20:59

几天前，我想知道如何自学基于堆的溢出利用。

所以我搜索了文档，随后练习了我阅读的内容，以便更好地了解堆在 Linux 下的工作原理。

我们被告知 malloc() / free() 函数围绕Doug Lea 的内存分配器工作，但是，尽管链接给出了很好的解释，但我在调试程序时无法弄清楚。

鉴于这个例子：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int n = 5;

int main(int argc, char** argv) {

        char* p;
        char* q;

        p = malloc(1024);
        q = malloc(1024);

        printf("real size = %d\n",*(((int*)p)-1) & 0xFFFFFFF8);

        if(argc >= 2) {
                strcpy(p, argv[1]);
        }

        free(q);
        printf("n = 0x%08X\n", n);
        free(p);

        return EXIT_SUCCESS;
}

我想将此结构转储到内存中：

struct chunk {
        int prev_size;
        int size;
        struct chunk *fd;
        struct chunk *bk;
};

这是我的工作流程：

geo@lilith:~/c/vuln_malloc$ gcc -o vuln vuln.c -m32 -ggdb
geo@lilith:~/c/vuln_malloc$ gdb ./vuln
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/geo/c/vuln_malloc/vuln...done.
(gdb) b 21
Breakpoint 1 at 0x804850f: file vuln.c, line 21.
(gdb) r `perl -e 'print "A" x 1024'`
Starting program: /home/geo/c/vuln_malloc/vuln `perl -e 'print "A" x 1024'`
real size = 1032

Breakpoint 1, main (argc=2, argv=0xffffd414) at vuln.c:21
21              free(q);
(gdb) x/10x q-4
0x804a40c:      0x00000409      0x00000000      0x00000000      0x00000000
0x804a41c:      0x00000000      0x00000000      0x00000000      0x00000000
0x804a42c:      0x00000000      0x00000000
(gdb)

在这里我可以看到 size-field 的值，即 0x409。我可以很容易地猜测我的块的实际大小是 0x409 & 0xFFFFFF8 = 0x408 = 1032，正如文档所解释的（三个最不重要的实际上定义了一些标志）。然后我运行直到 free() 函数被处理。

(gdb) b 22
Breakpoint 2 at 0x804851b: file vuln.c, line 22.
(gdb) c
Continuing.

Breakpoint 2, main (argc=2, argv=0xffffd414) at vuln.c:22
22              printf("n = 0x%08X\n", n);
(gdb) x/10x q-4
0x804a40c:      0x00020bf9      0x00000000      0x00000000      0x00000000
0x804a41c:      0x00000000      0x00000000      0x00000000      0x00000000
0x804a42c:      0x00000000      0x00000000

首先，我根本不理解新值 - 0x20bf9 - 其次，我不明白为什么 fd 和 bk 指针也没有任何相关值。

所有这些东西对我来说都没有多大意义，这就是为什么我想知道你是否可以给我一些关于所有这些的线索。Doug Lea 的实现是否仍然存在于最近的 glibc 版本中，或者......？

1个回答

首先，我有一个坏消息要告诉你！Doug Lea 的 malloc 几乎不再用于任何 C 库实现中（即使理解dlmalloc对理解新库有很大帮助）。

使用最广泛的新实现是ptmalloc2了解它的最佳方法是……阅读代码……因此，如果您使用的是 Debian（类似）发行版，就像我一样，您只需要像这样获取 libc 的源代码：

$> apt-get source libc6

请注意，glibc已不复存在，并已包含在eglibc 项目中。Debian 发行版不久前切换到 eglibc（参见这里和那里，甚至在glibc 源包页面）。因此，的实现malloc发生了很大变化（并且自以来添加了一些安全性dlmalloc）。

但是，让我们看看实现的样子malloc：

$> cd eglibc-2.17/malloc/
$> less malloc.c

...
/*
 This is a version (aka ptmalloc2) of malloc/free/realloc written by
 Doug Lea and adapted to multiple threads/arenas by Wolfram Gloger.

 There have been substantial changesmade after the integration into
 glibc in all parts of the code.  Do not look for much commonality
 with the ptmalloc2 version.
...

正如我所说，这里使用的算法是ptmalloc2(POSIX Threads Malloc)，但有重大修改。因此，您最好阅读代码以更好地了解它。

但是，总结一下，堆内存是通过内存块管理的，这些内存块以收集这些信息的元数据为前缀（我只是引用malloc.c源文件中的注释，更多信息请参考整个文件）：

/*
   malloc_chunk details:

    (The following includes lightly edited explanations by Colin Plumb.)

    Chunks of memory are maintained using a `boundary tag' method as
    described in e.g., Knuth or Standish.  (See the paper by Paul
    Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a
    survey of such techniques.)  Sizes of free chunks are stored both
    in the front of each chunk and at the end.  This makes
    consolidating fragmented chunks into bigger chunks very fast. The
    size fields also hold bits representing whether chunks are free or
    in use.

    An allocated chunk looks like this:


    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Size of previous chunk, if allocated            | |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Size of chunk, in bytes                       |M|P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             User data starts here...                          .
            .                                                               .
            .             (malloc_usable_size() bytes)                      .
            .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Size of chunk                                     |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


    Where "chunk" is the front of the chunk for the purpose of most of
    the malloc code, but "mem" is the pointer that is returned to the
    user.  "Nextchunk" is the beginning of the next contiguous chunk.

    Chunks always begin on even word boundries, so the mem portion
    (which is returned to the user) is also on an even word boundary, and
    thus at least double-word aligned.

    Free chunks are stored in circular doubly-linked lists, and look like this:

    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Size of previous chunk                            |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    `head:' |             Size of chunk, in bytes                         |P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Forward pointer to next chunk in list             |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Back pointer to previous chunk in list            |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Unused space (may be 0 bytes long)                .
            .                                                               .
            .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    `foot:' |             Size of chunk, in bytes                           |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    The P (PREV_INUSE) bit, stored in the unused low-order bit of the
    chunk size (which is always a multiple of two words), is an in-use
    bit for the *previous* chunk.  If that bit is *clear*, then the
    word before the current chunk size contains the previous chunk
    size, and can be used to find the front of the previous chunk.
    The very first chunk allocated always has this bit set,
    preventing access to non-existent (or non-owned) memory. If
    prev_inuse is set for any given chunk, then you CANNOT determine
    the size of the previous chunk, and might even get a memory
    addressing fault when trying to do so.

    Note that the `foot' of the current chunk is actually represented
    as the prev_size of the NEXT chunk. This makes it easier to
    deal with alignments etc but can be very confusing when trying
    to extend or adapt this code.

    The two exceptions to all this are

     1. The special chunk `top' doesn't bother using the
        trailing size field since there is no next contiguous chunk
        that would have to index off it. After initialization, `top'
        is forced to always exist.  If it would become less than
        MINSIZE bytes long, it is replenished.

     2. Chunks allocated via mmap, which have the second-lowest-order
        bit M (IS_MMAPPED) set in their size fields.  Because they are
        allocated one-by-one, each must contain its own trailing size field.
*/

您还可以参考“基于堆的利用”，这是 Scott Hand 关于堆管理和溢出利用的演讲。

不过，要了解所有内容，您还有很多工作要做，我希望有更多时间来解释。但是，我希望它可以帮助您找到更深入的方法（下载源代码是这里的关键）。

其它你可能感兴趣的问题