逆向工程 - 有没有办法调试一个 elf 文件，该文件运行时没有损坏标头的问题？ - 吾爱随笔录

有没有办法调试一个 elf 文件，该文件运行时没有损坏标头的问题？

逆向工程调试 linux 小精灵

2021-06-26 10:50:01

我的问题很笼统，但要提供一个示例，让我们从Whirlwind 教程中获取一个示例。

; tiny.asm
BITS 32
          org     0x00010000
          db      0x7F, "ELF"             ; e_ident
          dd      1                                       ; p_type
          dd      0                                       ; p_offset
          dd      $$                                      ; p_vaddr 
          dw      2                       ; e_type        ; p_paddr
          dw      3                       ; e_machine
          dd      _start                  ; e_version     ; p_filesz
          dd      _start                  ; e_entry       ; p_memsz
          dd      4                       ; e_phoff       ; p_flags
_start:
          mov     bl, 42                  ; e_shoff       ; p_align
          xor     eax, eax
          inc     eax                     ; e_flags
          int     0x80
          db      0
          dw      0x34                    ; e_ehsize
          dw      0x20                    ; e_phentsize
          db      1                       ; e_phnum
                                          ; e_shentsize
                                          ; e_shnum
                                          ; e_shstrndx

filesize      equ     $ - $$  ; tiny.asm

要使用nasm -f bin -o tiny nasm.asm;chmod +x tiny. 可执行本身，是一个有点小怪物。它比 ELF 头文件小，但包含 ELF 头文件、程序头文件和程序代码——但 Linux（至少在我的 64 Debian 上）运行它。

我希望能够调试这种类型的文件（有意或无意）损坏/不正确的 elf 标头。有没有修复精灵头的工具？是否有调试器可以运行这个文件？

我尝试的是获取入口点：readelf -h tiny但readelf甚至拒绝查看文件：readelf: Error: tiny: Failed to read file header. objdump没有更好。

运行rabin2 -e tiny我们得到入口地址（有一些警告）：

[Entrypoints]
vaddr=0x00010020 paddr=0x00010020 baddr=0x00000000 laddr=0x00000000 haddr=0x00000018 type=program

我设法使用radare2 tiny和pd命令进行了一些拆卸：

 [0x00010020]> pd
        ;-- entry0:
        0x00010020      b32a           mov bl, 0x2a                ; '*' ; 42
        0x00010022      31c0           xor eax, eax
        0x00010024      40             inc eax
        0x00010025      cd80           int 0x80
        0x00010027      003400         add byte [eax + eax], dh
        0x0001002a      2000           and byte [eax], al
        ;-- section_end.uphdr:
        0x0001002c  ~   01ff           add edi, edi

接下来我尝试了gdb tiny，lldb tiny但都没有奏效。IDA 的免费版本 5.0 停止在无限循环中。

那么有没有办法自动/半自动修复精灵？或者也许还有其他一些技巧可以调试这个（或类似的）二进制文件？我想到的一个想法是用循环和附加的指令修补入口点gdb。那行得通吗？

如果没有修复elf的工具，内核源码中哪些文件包含负责加载二进制的代码？

1个回答

有几个选项可用于分析标头损坏或损坏的 ELF 二进制文件。这些包括但不限于：

使用ptrace基于 -based 的调试器，例如 Radare2（但绝对不是 gdb）
仿真，例如通过Unicorn 仿真框架
修复标头，这可能涉及重建二进制文件

由于以下几个原因，这个特殊的二进制文件对标准工具来说是一个挑战：

程序头表与 ELF 头重叠而不是位于它之外。
没有节，与节有关的字段被程序头表覆盖，因此 - 从解析节信息的工具的角度来看 - 包含无意义的值。基于 BFD 的工具（例如objdumpGDB）依赖于存在且正确的部分信息，因此即使所有其他字段都包含正确的信息，它们也会失败。
入口点位于 ELF 标头内，这意味着标头内有可执行代码

使用基于 ptrace 的调试器

Radare2 能够附加到进程：

$ r2 -d tiny-i386 
Process with PID 6756 started...
= attach 6756 6756
bin.baddr 0x00010000
Using 0x10000
Warning: Cannot initialize program headers
Warning: Cannot initialize section headers
Warning: Cannot initialize strings table
Warning: Cannot initialize dynamic strings
Warning: Cannot initialize dynamic section
Warning: read (init_offset)
asm.bits 32
[0x00010020]> pd 5
            ;-- eip:
            0x00010020      b32a           mov bl, 0x2a                ; '*' ; 42
            0x00010022      31c0           xor eax, eax
            0x00010024      40             inc eax
            0x00010025      cd80           int 0x80
            0x00010027      003400         add byte [eax + eax], dh
[0x00010020]>

对于这么小的程序，像r2这样的东西似乎是相当重量级的。只有 7 个字节的指令。

还可以推出自己的基于 ptrace 的调试器。可以在调试器如何工作：第 1 部分 - 基础知识中找到这方面的一个很好的指南。

仿真

在这种情况下，仿真很容易，因为程序非常简单。仿真是应对此类挑战的一个很好的解决方案，因为除了第一条和最后一条指令的偏移量之外，不需要任何信息。可以从十六进制转储手动检索此信息，而根本不需要解析标头。

这是用于模拟问题中的二进制文件的脚本：

#!/usr/bin/python3

from unicorn import *
from unicorn.x86_const import *
from capstone import *
import struct


BASE = 0x100000
STACK_ADDR = 0x0
STACK_SIZE = 1024 * 1024

def read(name):
   with open(name, 'rb') as f:
      return f.read()

#https://github.com/unicorn-engine/unicorn/blob/master/bindings/python/shellcode.py
# callback for tracing instructions
def hook_code(uc, address, size, user_data):
    instruction = uc.mem_read(address, size)    # read this instruction code from memory
    md = user_data
    for i in md.disasm(instruction, address):
        print(">>> Tracing instruction at 0x%x, instruction size = 0x%x, disassembly:\t%s\t%s" %(i.address, i.size, i.mnemonic, i.op_str))


# callback for tracing Linux interrupt
def hook_intr(uc, intno, user_data):
    # only handle syscall
    if intno != 0x80:
        print("got interrupt %x ???" %intno);
        uc.emu_stop()
        return

    eax = uc.reg_read(UC_X86_REG_EAX)
    eip = uc.reg_read(UC_X86_REG_EIP)

    print(">>> 0x%x: INTERRUPT: 0x%x, EAX = 0x%x" %(eip, intno, eax))

    uc.emu_stop()



def main():

    mu = Uc(UC_ARCH_X86, UC_MODE_32)    # initialize emulation engine class
    mu.mem_map(BASE, STACK_SIZE)    # allocate space at base address
    mu.mem_map(STACK_ADDR, STACK_SIZE)  # allocate space for stack

    mu.mem_write(BASE, read("./tiny_binaries/tiny-i386"))   # write file to memory
    mu.reg_write(UC_X86_REG_ESP, STACK_ADDR + STACK_SIZE - 1)   # initialize stack

    md = Cs(CS_ARCH_X86, CS_MODE_32)    # initialize disassembler engine class

    # add hooks
    mu.hook_add(UC_HOOK_CODE, hook_code, md)    # pass disassembler engine to hook
    mu.hook_add(UC_HOOK_INTR, hook_intr)

    mu.emu_start(BASE + 0x20, BASE + 0x27)

    print(">>> Emulation Complete.")

if __name__ == "__main__":
    main()

以下输出是由二进制文件的模拟执行产生的：

$ ./emulate_tiny-i386.py 
>>> Tracing instruction at 0x100020, instruction size = 0x2, disassembly:   mov bl, 0x2a
>>> Tracing instruction at 0x100022, instruction size = 0x2, disassembly:   xor eax, eax
>>> Tracing instruction at 0x100024, instruction size = 0x1, disassembly:   inc eax
>>> Tracing instruction at 0x100025, instruction size = 0x2, disassembly:   int 0x80
>>> 0x100025: INTERRUPT: 0x80, EAX = 0x1
>>> Emulation Complete.

可以在此处找到完整的文章：使用格式错误的标头分析 ELF 二进制文件第 1 部分 - 模拟小程序。完全披露：我是这篇文章的作者。

修复标题

由于整个程序都包含在头文件中，修复它意味着重建二进制文件。程序头表必须与 ELF 头分开，然后必须将代码附加到程序头表的末尾，最后必须重新计算入口点以指向二进制文件中第一条指令的新偏移量。在这种特殊情况下，这可以使用名为lepton（我是开发人员）的工具相对简单地完成。这是完成重建二进制文件的脚本：

#!/usr/bin/python3

from lepton import *

def main():
    # create new headers
    with open("tiny-i386", "rb") as f:
        elf_file = ELFFile(f, new_header=True)

    # recompose binary
    with open("repaired_tiny-i386", "wb") as f:
        f.write(elf_file.recompose_binary())    # this moves the program header out of the file
                                                # header and recalculates the entry point
    print("\n\tRepaired header field values:\n")
    elf_file.ELF_header.print_fields()          # call once entry point has been recalculated


if __name__=="__main__":
    main()

重建后，readelf可以成功解析新的二进制文件：

$ readelf -h repaired_tiny-i386 
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x10054
  Start of program headers:          52 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         1
  Size of section headers:           0 (bytes)
  Number of section headers:         0
  Section header string table index: 0

$ readelf -l repaired_tiny-i386 

Elf file type is EXEC (Executable file)
Entry point 0x10054
There is 1 program header, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x00010000 0x00030002 0x10020 0x10020 R   0xc0312ab3

新文件的运行时行为与原始文件相同：

$ strace ./repaired_tiny-i386 
execve("./repaired_tiny-i386", ["./repaired_tiny-i386"], 0x7ffd19a0f1b0 /* 52 vars */) = 0
strace: [ Process PID=5822 runs in 32 bit mode. ]
exit(42)                                = ?
+++ exited with 42 +++

更多细节、信息和示例可以在lepton存储库的描述中找到。

结论

一般来说，如果二进制文件执行，应该可以附加ptrace. 然而，GDB 非常脆弱，很容易变得无用。仿真似乎是最健壮的解决方案，因为解析 ELF 标头在很大程度上是不必要的，并且可以挂钩任何执行的指令（基本上是完全控制）。

最后一点，关于内核如何加载 ELF 程序的详细介绍可以在 LWN 文章How program get run: ELF binaries 中找到。讨论中包括指向内核中相关代码的链接。

其它你可能感兴趣的问题

上一篇如何调试服务的ServiceMain函数？下一篇如何识别未知芯片