我尝试解决问题在哪里是一个大稀疏矩阵,和是稠密的矩阵(这里和)。反斜杠运算符产生正确的解x = A\B
(这显然会减慢进程。这里有什么问题?
我还尝试了 LU 分解然后求解中的每一列,但lu(A)
似乎没有被并行化(尽管我使用的是 version R2014b
)。我看到人们认为lu(A)
是并行化的,所以我错过了一些更新还是什么?
通过执行
spparms('spumoni',2)
x = A\b;
spparms
我得到以下输出
sp\: bandwidth = 60795+1+60795.
sp\: is A diagonal? no.
sp\: is band density (0.00) > bandden (0.50) to try banded solver? no.
sp\: is A triangular? no.
sp\: is A morally triangular? no.
sp\: is A a candidate for Cholesky (symmetric, real positive diagonal)? no.
sp\: use Unsymmetric MultiFrontal PACKage with automatic reordering.
UMFPACK V5.4.0 (May 20, 2009), Control:
Matrix entry defined as: double complex
Int (generic integer) defined as: UF_long
0: print level: 2
1: dense row parameter: 0.2
"dense" rows have > max (16, (0.2)*16*sqrt(n_col) entries)
2: dense column parameter: 0.2
"dense" columns have > max (16, (0.2)*16*sqrt(n_row) entries)
3: pivot tolerance: 0.1
4: block size for dense matrix kernels: 32
5: strategy: 0 (auto)
6: initial allocation ratio: 0.7
7: max iterative refinement steps: 2
12: 2-by-2 pivot tolerance: 0.01
13: Q fixed during numerical factorization: 0 (auto)
14: AMD dense row/col parameter: 10
"dense" rows/columns have > max (16, (10)*sqrt(n)) entries
Only used if the AMD ordering is used.
15: diagonal pivot tolerance: 0.001
Only used if diagonal pivoting is attempted.
16: scaling: 1 (divide each row by sum of abs. values in each row)
17: frontal matrix allocation ratio: 0.5
18: drop tolerance: 0
19: AMD and COLAMD aggressive absorption: 1 (yes)
The following options can only be changed at compile-time:
8: BLAS library used: Fortran BLAS. size of BLAS integer: 8
9: compiled for MATLAB
10: CPU timer is POSIX times ( ) routine.
11: compiled for normal operation (debugging disabled)
computer/operating system: Linux
size of int: 4 UF_long: 8 Int: 8 pointer: 8 double: 8 Entry: 16 (in bytes)
sp\: UMFPACK's factorization was successful.
sp\: UMFPACK's solve was successful.
UMFPACK V5.4.0 (May 20, 2009), Info:
matrix entry defined as: double complex
Int (generic integer) defined as: UF_long
BLAS library used: Fortran BLAS. size of BLAS integer: 8
MATLAB: yes.
CPU timer: POSIX times ( ) routine.
number of rows in matrix A: 754850
number of columns in matrix A: 754850
entries in matrix A: 86456682
memory usage reported in: 16-byte Units
size of int: 4 bytes
size of UF_long: 8 bytes
size of pointer: 8 bytes
size of numerical entry: 16 bytes
strategy used: symmetric
ordering used: amd on A+A'
modify Q during factorization: no
prefer diagonal pivoting: yes
pivots with zero Markowitz cost: 0
submatrix S after removing zero-cost pivots:
number of "dense" rows: 0
number of "dense" columns: 0
number of empty rows: 0
number of empty columns 0
submatrix S square and diagonal preserved
pattern of square submatrix S:
number rows and columns 754850
symmetry of nonzero pattern: 1.000000
nz in S+S' (excl. diagonal): 85701832
nz on diagonal of matrix S: 754850
fraction of nz on diagonal: 1.000000
AMD statistics, for strict diagonal pivoting:
est. flops for LU factorization: 4.16517e+14
est. nz in L+U (incl. diagonal): 7435413184
est. largest front (# entries): 896703025
est. max nz in any column of L: 29945
number of "dense" rows/columns in S+S': 0
symbolic factorization defragmentations: 0
symbolic memory usage (Units): 199337590
symbolic memory usage (MBytes): 3041.7
Symbolic size (Units): 1961813
Symbolic size (MBytes): 30
symbolic factorization CPU time (sec): 15.57
symbolic factorization wallclock time(sec): 15.45
matrix scaled: yes (divided each row by sum of abs values in each row)
minimum sum (abs (rows of A)): 3.25924e-02
maximum sum (abs (rows of A)): 6.95838e+01
symbolic/numeric factorization: upper bound actual %
variable-sized part of Numeric object:
initial size (Units) 226394213 225639362 100%
peak size (Units) 101780904057 8946021598 9%
final size (Units) 98119112401 7543694062 8%
Numeric final size (Units) 98124018962 7548223198 8%
Numeric final size (MBytes) 1497253.7 115176.7 8%
peak memory usage (Units) 101792135841 8957253382 9%
peak memory usage (MBytes) 1553224.7 136676.8 9%
numeric factorization flops 2.95433e+16 4.20989e+14 1%
nz in L (incl diagonal) 36831301536 3820083550 10%
nz in U (incl diagonal) 58996420369 3718115473 6%
nz in L+U (incl diagonal) 95826967055 7537444173 8%
largest front (# entries) 13417023050 896703025 7%
largest # rows in front 93703 29945 32%
largest # columns in front 144673 29945 21%
initial allocation ratio used: 0.0942
# of forced updates due to frontal growth: 0
number of off-diagonal pivots: 38
nz in L (incl diagonal), if none dropped 3820083550
nz in U (incl diagonal), if none dropped 3718115473
number of small entries dropped 0
nonzeros on diagonal of U: 754850
min abs. value on diagonal of U: 8.14e-09
max abs. value on diagonal of U: 1.42e+02
estimate of reciprocal of condition number: 5.75e-11
indices in compressed pattern: 13860039
numerical values stored in Numeric object: 7537500101
numeric factorization defragmentations: 3
numeric factorization reallocations: 0
costly numeric factorization reallocations: 0
numeric factorization CPU time (sec): 106790.54
numeric factorization wallclock time (sec): 7368.85
numeric factorization mflops (CPU time): 3942.19
numeric factorization mflops (wallclock): 57130.86
symbolic + numeric CPU time (sec): 106806.11
symbolic + numeric mflops (CPU time): 3941.62
symbolic + numeric wall clock time (sec): 7384.30
symbolic + numeric mflops (wall clock): 57011.33
solve flops: 1.86839e+11
iterative refinement steps taken: 1
iterative refinement steps attempted: 2
sparse backward error omega1: 4.93e-16
sparse backward error omega2: 0.00e+00
solve CPU time (sec): 248.79
solve wall clock time (sec): 246.02
solve mflops (CPU time): 750.99
solve mflops (wall clock time): 759.45
total symbolic + numeric + solve flops: 4.21176e+14
total symbolic + numeric + solve CPU time: 107054.90
total symbolic + numeric + solve mflops (CPU): 3934.20
total symbolic+numeric+solve wall clock time: 7633.09
total symbolic+numeric+solve mflops(wallclock) 55177.60
SParse MONItor output level 2.
mmd: threshold = 1.1 * mindegree + 1,
using approximate degrees in A'*A,
supernode amalgamation every 3 stages,
row reduction every 3 stages,
withhold rows at least 50% dense in colmmd.
Minimum degree orderings used with v4 chol, lu, and qr in \ and /.
Approximate minimum degree orderings used with CHOLMOD and UMFPACK in \ and /.
Pivot tolerance of 0.1 used by UMFPACK in \ and /.
Backslash uses band solver if band density is > 0.5
UMFPACK used for lu in \ and /.
Symmetric pivot tolerance of 0.001 used by UMFPACK in \ and /.
Pivot tolerance of 0.01 used by MA57 in \ and /.