我perf在 Ubuntu 18.04 中在我们的编译器中的一个简单基准测试中测试了该工具(解析一些文件)。我跑perf了几次,把最慢的放在这里:
$ sudo perf stat ./parse
Construct
Parse
Parsing: 93ms
Counting: 14ms
Total: 107ms
Count: 450009
String size (bytes): 2250042
Allocator usage (bytes): 48400928
Performance counter stats for './parse':
112.329506 task-clock (msec) # 0.999 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
14,560 page-faults # 0.130 M/sec
418,991,316 cycles # 3.730 GHz
843,385,199 instructions # 2.01 insn per cycle
139,033,655 branches # 1237.730 M/sec
1,033,015 branch-misses # 0.74% of all branches
0.112438670 seconds time elapsed
和最快的运行:
$ sudo perf stat ./parse
Construct
Parse
Parsing: 80ms
Counting: 12ms
Total: 93ms
Count: 450009
String size (bytes): 2250042
Allocator usage (bytes): 48400928
Performance counter stats for './parse':
97.922823 task-clock (msec) # 0.998 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
14,559 page-faults # 0.149 M/sec
376,416,206 cycles # 3.844 GHz
843,259,218 instructions # 2.24 insn per cycle
138,973,382 branches # 1419.213 M/sec
1,031,451 branch-misses # 0.74% of all branches
0.098073332 seconds time elapsed
在比较各种统计数据时,以下是从最慢运行到最快运行的百分比改进:
task-clock 12.83% faster
cycles 10.16% less
frequency 3.06% faster
instructions 0.015% less
insn per cycle 11.44% more
branches 0.043% less
branch-misses 0.15% less
为什么instructions比总时间 ( task-clock) 或3 个数量级更稳定cycles?
一种解释可能是我的 CPU(Intel(R) Xeon(R) CPU E3-1505M v6 @ 3.00GHz)每次执行大致相同数量的指令,但取决于各种因素(温度、过去的历史、运行的其他程序linux 同时等)它有时可以通过隐藏延迟/并行执行指令来在每个周期多执行约 10% 的指令,这意味着它需要减少约 10% 的周期来执行所有这些指令,并且鉴于频率大致相同(约 3% 以内),这意味着时间减少约 10%。这是一个正确的解释吗?