为什么使用“perf”时“指令退休”比“周期”或“任务时钟”更稳定?

计算科学 表现
2021-12-10 16:26:09

perf在 Ubuntu 18.04 中在我们的编译器中的一个简单基准测试中测试了该工具(解析一些文件)。我跑perf了几次,把最慢的放在这里:

$ sudo perf stat ./parse
Construct
Parse
Parsing: 93ms
Counting: 14ms
Total: 107ms
Count: 450009
String size (bytes):      2250042
Allocator usage (bytes): 48400928

 Performance counter stats for './parse':

        112.329506      task-clock (msec)         #    0.999 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
            14,560      page-faults               #    0.130 M/sec                  
       418,991,316      cycles                    #    3.730 GHz                    
       843,385,199      instructions              #    2.01  insn per cycle         
       139,033,655      branches                  # 1237.730 M/sec                  
         1,033,015      branch-misses             #    0.74% of all branches        

       0.112438670 seconds time elapsed

和最快的运行:

$ sudo perf stat ./parse
Construct
Parse
Parsing: 80ms
Counting: 12ms
Total: 93ms
Count: 450009
String size (bytes):      2250042
Allocator usage (bytes): 48400928

 Performance counter stats for './parse':

         97.922823      task-clock (msec)         #    0.998 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
            14,559      page-faults               #    0.149 M/sec                  
       376,416,206      cycles                    #    3.844 GHz                    
       843,259,218      instructions              #    2.24  insn per cycle         
       138,973,382      branches                  # 1419.213 M/sec                  
         1,031,451      branch-misses             #    0.74% of all branches        

       0.098073332 seconds time elapsed

在比较各种统计数据时,以下是从最慢运行到最快运行的百分比改进:

task-clock      12.83% faster
cycles          10.16% less
frequency        3.06% faster                   
instructions     0.015% less
insn per cycle  11.44% more 
branches         0.043% less              
branch-misses    0.15% less           

为什么instructions比总时间 ( task-clock) 或3 个数量级更稳定cycles

一种解释可能是我的 CPU(Intel(R) Xeon(R) CPU E3-1505M v6 @ 3.00GHz)每次执行大致相同数量的指令,但取决于各种因素(温度、过去的历史、运行的其他程序linux 同时等)它有时可以通过隐藏延迟/并行执行指令来在每个周期多执行约 10% 的指令,这意味着它需要减少约 10% 的周期来执行所有这些指令,并且鉴于频率大致相同(约 3% 以内),这意味着时间减少约 10%。这是一个正确的解释吗?

0个回答
没有发现任何回复~