帮助我破译 Python 的封送代码对象。该.pyc
文件几乎是相同的:的pyc文件的结构。
我有:
- 从源代码编译的代码对象。
- 此代码对象的封送表示。
- 其(代码对象)代码部分的递归反汇编。
- 它的所有字段值。
主要的意思:
我想找出不同的代码对象如何相互存储和引用。也就是说,子代码对象的链接是如何存储的?该模块应该引用其所有功能。该函数应该引用所有其他函数,可以从中调用。等等 虚拟机在将代码对象id
存储到.pyc
?时是否保留代码对象?我不这么认为,因为id
在.pyc
文件中看不到s 。
例如,我在反汇编源中有这样的说明:
LOAD_CONST 2 (<code object baz at 0x7f380995e5d0, file "foo.py", line 7>)
因此:
- 虚拟机将如何找到
baz
代码对象?我看不到所有这些信息:0x7f380995e5d0, file "foo.py", line 7
在编组字符串中。对象 id 是0x7f380995e5d0
存储在编组代码中还是在每次程序运行时创建? - 如果不存储,如何在编组代码对象(
.pyc
文件)中保留对象的连接?
我想,我会gdb
进一步调查,但也许这种方法(.pyc
文件解密)也能完成这项工作。
当前结果:
我使用所有这些信息来创建下一个文件:第一列是编组代码对象的二进制表示,第二列是每个字节序列的含义,我已经确定了。
b'
\xe3 <don't know>
\x00\x00\x00\x00 <foo.py: co_argcount: 0>
\x00\x00\x00\x00 <foo.py: co_kwonlyargcount: 0>
\x00\x00\x00\x00 <foo.py: co_nlocals: 0>
\x03\x00\x00\x00 <foo.py: co_stacksize: 3>
@\x00\x00\x00 <foo.py: co_flags = '@' = 0x40 = 64>
s.\x00\x00\x00 <foo.py: number of bytes for module instructions = '.' = 46>
d\x00 <foo.py: co_code: 0 LOAD_CONST 0 (1)
Z\x00 <foo.py: co_code: 2 STORE_NAME 0 (a)
d\x01 <foo.py: co_code: 4 LOAD_CONST 1 (2)
Z\x01 <foo.py: co_code: 6 STORE_NAME 1 (b)
e\x00 <foo.py: co_code: 8 LOAD_NAME 0 (a)
e\x01 <foo.py: co_code: 10 LOAD_NAME 1 (b)
\x17\x00 <foo.py: co_code: 12 BINARY_ADD
Z\x02 <foo.py: co_code: 14 STORE_NAME 2 (c)
d\x02 <foo.py: co_code: 16 LOAD_CONST 2 (<code object baz at 0x7f380995e5d0, file "foo.py", line 7>)
d\x03 <foo.py: co_code: 18 LOAD_CONST 3 ('baz')
\x84\x00 <foo.py: co_code: 20 MAKE_FUNCTION 0
Z\x03 <foo.py: co_code: 22 STORE_NAME 3 (baz)
e\x03 <foo.py: co_code: 24 LOAD_NAME 3 (baz)
e\x00 <foo.py: co_code: 26 LOAD_NAME 0 (a)
e\x01 <foo.py: co_code: 28 LOAD_NAME 1 (b)
\x83\x02 <foo.py: co_code: 30 CALL_FUNCTION 2
Z\x04 <foo.py: co_code: 32 STORE_NAME 4 (multiplication)
e\x04 <foo.py: co_code: 34 LOAD_NAME 4 (multiplication)
d\x01 <foo.py: co_code: 36 LOAD_CONST 1 (2)
\x13\x00 <foo.py: co_code: 38 BINARY_POWER
Z\x05 <foo.py: co_code: 40 STORE_NAME 5 (square)
d\x04 <foo.py: co_code: 42 LOAD_CONST 4 (None)
S\x00 <foo.py: co_code: 44 RETURN_VALUE
)\x05 <foo.py: co_const: size>
\xe9\x01\x00\x00\x00 <foo.py: co_const[0]: 1>
\xe9\x02\x00\x00\x00 <foo.py: co_const[1]: 2>
c <TYPE_CODE>
\x02\x00\x00\x00 <baz: co_argcount: 2>
\x00\x00\x00\x00 <baz: co_kwonlyargcount: 0>
\x02\x00\x00\x00 <baz: co_nlocals: 2>
\x02\x00\x00\x00 <baz: co_stacksize: 2>
C\x00\x00\x00 <baz: co_flags = 'C' = 0x43 = 67>
s\x08\x00\x00\x00 <baz: co_code: size = 8 bytes>
|\x00 <baz: co_code: 0 LOAD_FAST 0 (x)
|\x01 <baz: co_code: 2 LOAD_FAST 1 (y)
\x14\x00 <baz: co_code: 4 BINARY_MULTIPLY
S\x00 <baz: co_code: 6 RETURN_VALUE
)\x01 <baz: co_const: size>
N <baz: co_const[0]: None>
\xa9\x00 <don't know>
)\x02 <baz: co_varnames: size>
\xda\x01 <baz: number of characters of next item>
x <baz: co_varnames[0]: x>
\xda\x01 <baz: number of characters of next item>
y <baz: co_varnames[1]: y>
r\x03\x00\x00\x00 <baz: don't know. But the 'r' = 'TYPE_REF'>
r\x03\x00\x00\x00 <baz: don't know. But the 'r' = 'TYPE_REF'>
\xfa\x06 <baz: next item length>
foo.py <baz: co_filename>
\xda\x03 <baz: number of characters of next item>
baz <baz: co_name: 'baz'>
\x07\x00\x00\x00 <baz: co_firstlineno: 7>
s\x02\x00\x00\x00 <baz: co_lnotab: size = 2 >
\x00\x01 <baz: co_lnotab>
r\x07\x00\x00\x00 <foo.py: co_const[3]: reference to baz>
N <foo.py: co_const[4]: None>
)\x06 <foo.py: co_names: size>
\xda\x01 <foo.py: number of characters of next item>
a <foo.py: co_names[0]: a>
\xda\x01 <foo.py: number of characters of next item>
b <foo.py: co_names[1]: b>
\xda\x01 <foo.py: number of characters of next item>
c <foo.py: co_names[2]: c>
r\x07\x00\x00\x00 <foo.py: co_names[3]: reference to baz>
Z\x0e <foo.py: number of characters of next item>
multiplication <foo.py: co_names[4]: multiplication>
Z\x06 <foo.py: number of characters of next item>
square <foo.py: co_names[5]: square>
r\x03\x00\x00\x00 <foo.py: don't know>
r\x03\x00\x00\x00 <foo.py: don't know>
r\x03\x00\x00\x00 <foo.py: don't know>
r\x06\x00\x00\x00 <foo.py: don't know>
\xda\x08 <foo.py: number of characters of next item>
<module> <foo.py: co_name>
\x03\x00\x00\x00 <foo.py: co_firstlineno>
s\n\x00\x00\x00 <foo.py: co_lnotab: size = '\n' = 0A>
\x04\x01 <foo.py: o_lnotab>
\x04\x01 <foo.py: o_lnotab>
\x08\x02 <foo.py: o_lnotab>
\x08\x07 <foo.py: o_lnotab>
\n\x01' <foo.py: o_lnotab>
复制所需的代码片段:
1)源代码:foo.py
a = 1
b = 2
c = a + b
def baz(x,y):
return x * y
multiplication = baz(a,b)
square = multiplication ** 2
2)封送表示的foo.py
。
source_py = "foo.py"
with open(source_py) as f_source:
source_code = f_source.read()
code_obj_compile = compile(source_code, source_py, "exec")
data = marshal.dumps(code_obj_compile)
print(data)
3)代码对象的完整(递归)反汇编。
import types
dis.dis(code_obj_compile)
for x in code_obj_compile.co_consts:
if isinstance(x, types.CodeType):
sub_byte_code = x
func_name = sub_byte_code.co_name
print('\nDisassembly of %s:' % func_name)
dis.dis(sub_byte_code)
4)所有代码对象的字段值。
def print_co_obj_fields(code_obj):
# Iterating through all instance attributes
# and calling all having the 'co_' prefix
for name in dir(code_obj):
if name.startswith('co_'):
co_field = getattr(code_obj, name)
print(f'{name:<20} = {co_field}')
print_co_obj_fields(code_obj_compile)