我正在尝试解决一个大型系统,在 PETSc 的帮助下。由于问题的大小,我使用的是无矩阵方法,其中只是一个外壳。我还提供了我自己的预处理器(不是外壳),并且我在预处理器上使用了 ilu(2) 分解。
问题:求解器的设置阶段(参见下面的相关代码块)需要很长时间。我怀疑这主要是需要一段时间的预处理器的 ilu。我知道预计 ilu 将需要时间,但我担心以下原因:当我尝试使用直接求解器解决相同的问题时(使用 MKL Lapacke 找到 LU 分解然后反转,在 PETSc 之外),LU 快 10 倍。我预计 PETSc 的 ilu 应该花费与完整 LU 分解相当的时间,但它慢了 10 倍,这似乎很奇怪。(顺便说一句,你可能会问,如果我可以用 LU 做,为什么还要使用迭代求解器,但是这个例子没有我实际想要运行的那么大,此时我将无法使用直接求解器)。
以下是与该问题相关的代码片段:
MatCreateShell(comm, Nu, Nu, Nu, Nu, ctx, &A_shell);
MatShellSetOperation(A_shell, MATOP_MULT, (void(*)(void))usermult);
KSPCreate(comm, &solver);
KSPSetOperators(solver, A_shell, PreconditionerMatrix);
KSPSetInitialGuessNonzero(solver, PETSC_TRUE);
KSPSetNormType(solver, KSP_NORM_UNPRECONDITIONED);
KSPSetFromOptions(solver);
KSPSetUp(solver);
我知道/尝试过的事情:
- 矩阵的条件数可以大到,但我不认为我的问题与此有任何关系,因为同样,时间接收器在设置中。如果这是问题所在,那么当我进行完整的 LU 分解时,它也会显现出来,但事实并非如此。
我知道我必须为后 ilu 矩阵提供足够大的填充因子猜测,并且我已将选项设置
-pc_factor_fill为 3。使用 运行代码后-info,我确认这足以防止任何内存重新分配。有趣的旁注:当我使用 运行它时-info,它会很快报告所需的填充因子。这是否意味着它实际上执行 ilu 的速度预期很快,但随后卡在了其他地方?我在吠叫错误的树吗?这是它的报告:[0] PetscCommDuplicate(): Using internal PETSc communicator 7412512 20851120 [0] PetscCommDuplicate(): Using internal PETSc communicator 7412512 20851120 [0] PCSetUp(): Setting up PC for first time [0] PetscCommDuplicate(): Using internal PETSc communicator 7412512 20851120 [0] PetscCommDuplicate(): Using internal PETSc communicator 7412512 20851120 [0] PetscCommDuplicate(): Using internal PETSc communicator 7412512 20851120 [0] PetscCommDuplicate(): Using internal PETSc communicator 7412512 20851120 [0] MatILUFactorSymbolic_SeqAIJ(): Reallocs 0 Fill ratio:given 3. needed 1.7385 [0] MatILUFactorSymbolic_SeqAIJ(): Run with -[sub_]pc_factor_fill 1.7385 or use [0] MatILUFactorSymbolic_SeqAIJ(): PCFactorSetFill([sub]pc,1.7385); [0] MatILUFactorSymbolic_SeqAIJ(): for best performance. [0] MatSeqAIJCheckInode_FactorLU(): Found 2030 nodes of 6096. Limit used: 5. Using Inode routines然后它卡住了很长时间......所以也许我将填充因子设置得太大了?我用填充因子 2 再次尝试了同样的事情,但没有任何区别。
我已经确定我没有使用 PETSc 的调试安装;在计时代码时,我肯定会
--with-debugging=0在 PETSc 配置步骤中使用。- 我没有使用任何并行化。
这是使用生成的输出-log_view:
Using Petsc Release Version 3.7.6, Apr, 24, 2017
Max Max/Min Avg Total
Time (sec): 4.057e+02 1.00000 4.057e+02
Objects: 7.050e+02 1.00000 7.050e+02
Flops: 2.161e+11 1.00000 2.161e+11 2.161e+11
Flops/sec: 5.327e+08 1.00000 5.327e+08 5.327e+08
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 4.0567e+02 100.0% 2.1612e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 624 1.0 3.2336e+01 1.0 4.23e+10 1.0 0.0e+00 0.0e+00 0.0e+00 8 20 0 0 0 8 20 0 0 0 1307
MatMultAdd 2480 1.0 5.4122e-02 1.0 5.00e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 923
MatMultTranspose 3100 1.0 2.8153e+00 1.0 9.05e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 3215
MatSolve 608 1.0 4.6020e+01 1.0 7.41e+10 1.0 0.0e+00 0.0e+00 0.0e+00 11 34 0 0 0 11 34 0 0 0 1611
MatLUFactorNum 1 1.0 4.4004e+01 1.0 9.31e+10 1.0 0.0e+00 0.0e+00 0.0e+00 11 43 0 0 0 11 43 0 0 0 2115
MatILUFactorSym 1 1.0 3.4659e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0
MatAssemblyBegin 27 1.0 1.5497e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 27 1.0 6.8570e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRow 10650098 1.0 6.6575e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
MatGetRowIJ 1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.3018e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 39 1.0 1.5883e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMatMult 2 1.0 3.5977e+00 1.0 5.97e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 1660
MatMatMultSym 2 1.0 7.2353e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMatMultNum 2 1.0 2.8741e+00 1.0 5.97e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 2078
VecMDot 302 1.0 4.2068e-02 1.0 2.06e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4907
VecNorm 621 1.0 1.4746e-02 1.0 3.03e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2054
VecScale 314 1.0 2.0843e-03 1.0 7.66e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3674
VecCopy 636 1.0 1.0492e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 4096 1.0 4.1316e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 5278 1.0 9.9347e-02 1.0 1.96e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1977
VecAYPX 306 1.0 4.3933e-03 1.0 7.46e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1698
VecMAXPY 608 1.0 5.9476e-02 1.0 4.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6993
VecAssemblyBegin 2493 1.0 1.5733e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 2493 1.0 1.5340e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 314 1.0 1.0375e-02 1.0 2.30e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2214
KSPGMRESOrthog 302 1.0 7.3104e-02 1.0 4.13e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 5647
KSPSetUp 1 1.0 1.2398e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 4 1.0 7.8504e+01 1.0 1.17e+11 1.0 0.0e+00 0.0e+00 0.0e+00 19 54 0 0 0 19 54 0 0 0 1491
PCSetUp 1 1.0 7.8663e+01 1.0 9.31e+10 1.0 0.0e+00 0.0e+00 0.0e+00 19 43 0 0 0 19 43 0 0 0 1183
PCApply 608 1.0 4.6022e+01 1.0 7.41e+10 1.0 0.0e+00 0.0e+00 0.0e+00 11 34 0 0 0 11 34 0 0 0 1611
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 33 30 1306648980 0.
Vector 663 663 65247472 0.
Krylov Solver 1 1 35264 0.
Preconditioner 1 1 1008 0.
Viewer 2 0 0 0.
Index Set 5 5 87624 0.
========================================================================================================================
Average time to get PetscTime(): 5.96046e-07
#PETSc Option Table entries:
-ksp_atol 1e-8
-ksp_converged_reason
-ksp_monitor
-ksp_monitor_true_residual
-ksp_rtol 1e-8
-log_view
-pc_factor_fill 3
-pc_factor_levels 2
-pc_type ilu
#End of PETSc Option Table entries
Compiled with FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4
Configure options: PETSC_ARCH=arch-linux2-cxx-nodebug --with-scalar-type=complex --with-fortran-kernels=1 --with-clanguage=c++ --with-debugging=0 --with-cxx=g++ CXXOPTFLAGS=-O3 COPTFLAGS=O3 FOPTFLAGS=-O3 --download-openmpi --with-blaslapack-dir=/opt/intel/mkl
问题
- 我知道对此可能没有简单/明显的解决方案,但至少我想了解为什么会发生这种情况,以及 PETSc 在内部做什么需要这么长时间
- 如果没有明确的解决方案,我可以采取哪些步骤来尝试减轻这种情况或进一步调查?
- 这是预期的/正常的,我应该停止担心它并把它吸起来吗?
对不起这个问题的长度。谢谢你的时间!