逆向工程 - 如何在二进制中找到数组的大小？ - 吾爱随笔录

如何在二进制中找到数组的大小？

逆向工程 C 转储二进制

2021-06-25 02:40:17

我目前正在学习 RE 并尝试了解一些基本的 c 程序。

我几乎弄清楚了一些概念，但是现在我不知道如何在使用 objdump 或 gdb 时找到数组的大小。

例如：

int main(int argc, char **argv)
{
  char buffer[64];               // <= Where i supposed to find the array size ?
  gets(buffer);                   
  printf("Buffer : %s",buffer);
  return 0;
}

任何人都可以向我解释这怎么可能？

1个回答

没有简单的方法可以做到这一点，C 没有数组大小的概念（在执行时），所以大小不会存储在任何地方。您必须阅读汇编代码并（尝试）理解它。

采取以下程序

extern void *malloc(int);
extern char *strcpy(char *dst, char *src);

char firstname[80];
char lastname[80];

int main(void) {
    int some_variable=1;
    char buffer[64];
    int some_other_variable=2;
    char *otherbuffer=malloc(100);
    gets(buffer);
    strcpy(firstname, "John");
    strcpy(lastname, "Doe");
}

并用cc -fno-builtin -O0 -o arraysize arraysize.c. （我不得不禁用内置函数以防止 gcc 使 malloc 和 strcpy 短路，出于同样的原因，我自己声明了它们而不是使用标头。此外，如果没有 -O0，gcc 会忽略从未使用过的东西）。

然后，使用objdump -d arraysize并检查该main函数：

0000000000400554 <main>:
  400554:   55                      push   %rbp
  400555:   48 89 e5                mov    %rsp,%rbp

// This instruction tells you that the function needs 80 (0x50) bytes on the
// stack. This happens to be the same as the size of all local variables
// here, but might be higher as well if the function needs stack space for
// function arguments and the like.
  400558:   48 83 ec 50             sub    $0x50,%rsp


// This puts 1 and 2 into the integer variables. Note we now know they're
// located at -0x10(%rbp) and -0xc(%rbp) on the stack.
  40055c:   c7 45 f0 01 00 00 00    movl   $0x1,-0x10(%rbp)
  400563:   c7 45 f4 02 00 00 00    movl   $0x2,-0xc(%rbp)

// This calls malloc(100) and puts the result into -0x8(rbp). We now know
// the array pointed to has 100 bytes, because that's what was malloc'ed.
// Note that you have no other way of finding out the size afterwards
// (except if you know how exactly malloc is implemented and where malloc
// keeps its internal housekeeping structures)
  40056a:   bf 64 00 00 00          mov    $0x64,%edi
  40056f:   e8 b4 fe ff ff          callq  400428 <malloc@plt>
  400574:   48 89 45 f8             mov    %rax,-0x8(%rbp)

// now, we call gets, feeding it with -0x50(%rbp) as its parameter.
// As the next variable that's used on the stack is at -0x10(rbp), we can
// assume that the array has 0x40=64 bytes. This does not have to be true;
// for example, if the function declared 2 arrays of 32 bytes each, they'd
// be at -0x50(%rbp) and -0x30(%rbp), and if the function never used the
// one at 0x30(%rbp), there'd be no way for us to tell the difference.
  400578:   48 8d 45 b0             lea    -0x50(%rbp),%rax
  40057c:   48 89 c7                mov    %rax,%rdi
  40057f:   b8 00 00 00 00          mov    $0x0,%eax
  400584:   e8 bf fe ff ff          callq  400448 <gets@plt>

// This is the strcpy to firstname. The address of firstname is at 0x6009e0.
// We don't know how large it is, as we haven't seen a variable behind it yet.
  400589:   be a8 06 40 00          mov    $0x4006a8,%esi
  40058e:   bf e0 09 60 00          mov    $0x6009e0,%edi
  400593:   e8 c0 fe ff ff          callq  400458 <strcpy@plt>

// And this is the second strcpy, to lastname at 0x6000980. Since we've
// seen the other strcpy to 0x60009e0, we assume that there are no more than
// 0x50=80 bytes in that buffer, but see below.
  400598:   be ad 06 40 00          mov    $0x4006ad,%esi
  40059d:   bf 80 09 60 00          mov    $0x600980,%edi
  4005a2:   e8 b1 fe ff ff          callq  400458 <strcpy@plt>

// end of function
  4005a7:   c9                      leaveq 
  4005a8:   c3                      retq

C源代码说

char firstname[80];
char lastname[80];
strcpy(firstname, "John");
strcpy(lastname, "Doe");

根据两个strcpys之间的地址差异，我们假设数组大小为 80。但请注意，在这种情况下会生成完全相同的指令：

char name[160];
strcpy(name+80, "John");
strcpy(name, "Doe");

所以如果你没有调试符号，你得到的只是假设。

其它你可能感兴趣的问题

上一篇如何在进程加载到内存后立即附加到进程下一篇寻找执行流程/函数调用？