查看这个分支:arch/x86/kernel/fpu(它处理所有 x86 的 FPU 特定的东西):
阅读可以提供答案的评论:
http://lxr.free-electrons.com/source/arch/x86/kernel/fpu/init.c
/*
* FPU context switching strategies:
*
* Against popular belief, we don't do lazy FPU saves, due to the
* task migration complications it brings on SMP - we only do
* lazy FPU restores.
*
* 'lazy' is the traditional strategy, which is based on setting
* CR0::TS to 1 during context-switch (instead of doing a full
* restore of the FPU state), which causes the first FPU instruction
* after the context switch (whenever it is executed) to fault - at
* which point we lazily restore the FPU state into FPU registers.
*
* Tasks are of course under no obligation to execute FPU instructions,
* so it can easily happen that another context-switch occurs without
* a single FPU instruction being executed. If we eventually switch
* back to the original task (that still owns the FPU) then we have
* not only saved the restores along the way, but we also have the
* FPU ready to be used for the original task.
*
* 'lazy' is deprecated because it's almost never a performance win
* and it's much more complicated than 'eager'.
*
* 'eager' switching is by default on all CPUs, there we switch the FPU
* state during every context switch, regardless of whether the task
* has used FPU instructions in that time slice or not. This is done
* because modern FPU context saving instructions are able to optimize
* state saving and restoration in hardware: they can detect both
* unused and untouched FPU state and optimize accordingly.
因此,重复解释的内容:
一个。LAZY模式:FPU不会一直恢复/保存,只有在使用的时候才会恢复,使用FPU也会重置CR0:TS中的一个flag,所以我们不需要一直检测FPU寄存器的使用情况. 但是这种模式不是默认的,因为时间节省/性能增强并不显着,而且算法变得非常复杂,从而增加了处理开销。
湾。EAGER 模式:这是默认模式。FPU 总是为每个上下文切换保存和恢复。但同样有硬件功能可以检测是否使用了长链的 FPU 寄存器 - 无论使用哪个,只有该寄存器将被保存/恢复,因此它的硬件效率很高。
要做到这一点绝非易事,因为这意味着在 2015 年编写了 208 个补丁:
https://lwn.net/Articles/643235/
保存所有 FPU - XMM、MMX、SSE、SSE2 等的指令称为 FXSAVE、FNSAVE、FSAVE:
http://x86.renejeschke.de/html/file_module_x86_id_128.html
linux内核中的开销以87个周期为基准。
https://lwn.net/Articles/643235/
这些优化的保存方式也可以在下面的评论中找到:
* When executing XSAVEOPT (or other optimized XSAVE instructions), if
* a processor implementation detects that an FPU state component is still
* (or is again) in its initialized state, it may clear the corresponding
* bit in the header.xfeatures field, and can skip the writeout of registers
* to the corresponding memory layout.
*
* This means that when the bit is zero, the state component might still contain
* some previous - non-initialized register state.
为了检测内核在使用 FPU 时触发,我们可以在 KGDB 中的 fpstate_sanitize_xstate 上设置断点,内核堆栈跟踪如下:
Thread 441 hit Breakpoint 1, fpstate_sanitize_xstate (fpu=0xffff8801e7a2ea80) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/fpu/xstate.c:111
111 {
#0 fpstate_sanitize_xstate (fpu=0xffff8801e7a2ea80) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/fpu/xstate.c:111
#1 0xffffffff8103b183 in copy_fpstate_to_sigframe (buf=0xffff8801e7a2ea80, buf_fx=0x7f73ad4fe3c0, size=<optimized out>) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/fpu/signal.c:178
#2 0xffffffff8102e207 in get_sigframe (frame_size=440, fpstate=0xffff880034dcbe10, regs=<optimized out>, ka=<optimized out>) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/signal.c:247
#3 0xffffffff8102e703 in __setup_rt_frame (regs=<optimized out>, set=<optimized out>, ksig=<optimized out>, sig=<optimized out>) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/signal.c:413
#4 setup_rt_frame (regs=<optimized out>, ksig=<optimized out>) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/signal.c:627
#5 handle_signal (regs=<optimized out>, ksig=<optimized out>) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/signal.c:671
#6 do_signal (regs=0xffff880034dcbf58) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/kernel/signal.c:714
#7 0xffffffff8100320c in exit_to_usermode_loop (regs=0xffff880034dcbf58, cached_flags=4) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/entry/common.c:248
#8 0xffffffff81003c6e in prepare_exit_to_usermode (regs=<optimized out>) at /build/linux-FvcHlK/linux-4.4.0/arch/x86/entry/common.c:283
使用“info thread 441”(见上文),您会发现“Xorg”是上述堆栈跟踪的发起者,但除此之外,大多数进程不使用 FPU。
从堆栈跟踪中,“get_sigframe()”似乎是第一个分析 FPU 使用情况的函数:
if (fpu->fpstate_active) {
unsigned long fx_aligned, math_size;
sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size);
*fpstate = (struct _fpstate_32 __user *) sp;
if (copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned,
math_size) < 0)
return (void __user *) -1L;
}
所以基本上这里发生的事情是将 FPU 信息复制到用户空间堆栈指针(即“sp”)。
所以总而言之,这部分 FPU 保存/复制/恢复逻辑仅在使用 FPU 时触发。