连续的过度放松不收敛(当没有就地完成时)

计算科学 Python 近似 麻木的
2021-12-23 05:37:09

我正在尝试使用连续的过度松弛方法在给定一些边界条件的情况下找到潜力。

我有 2 个解决方案:

-One 遍历所有元素并将公式应用field[y,x] = (1-alpha)*field[y,x] + (field[max(y-1,0),x] + field[min(y+1,field.shape[0]-1),x] + field[y,max(x-1,0)] + field[y,min(x+1,field.shape[1]-1)]) * alpha/4到位。这很慢,因为它不能以一种好的方式访问内存。

-另一个,我创建了 4 个矩阵,在 4 个方向上移动了 1。我应用相同的公式,然后将矩阵相加。然而,这并没有考虑在当前迭代期间所做的修改。这比前一个要快得多。

使用 alpha = 1.9,第一个算法收敛,而第二个算法不收敛。对于 alpha = 1.0,两者都收敛但非常缓慢。

谁能告诉我我做错了什么?以及如何解决快速解决方案。

完整代码:

#! python3

import numpy
import math
import time

def solve_laplace(boundary, mask, file = None, alpha = 1.0, threshold = 0.0001):
    """
    We are using the successive over-relaxation method. We iterate until our solution changes less than some threshold value.

    Vm+1(x,y,...) = alpha*( ((Vm(x-1,y,...) + Vm(x+1,y,...) + Vm(x,y-1,...) + Vm(x,y+1,...) + ...)/(2*nr dimensions) ) + (1-alpha)*Vm(x,y,...)
    """

    dim = boundary.ndim

    threshold = 0.0001
    field = numpy.zeros_like(boundary)
    numpy.copyto(field, boundary, casting = "safe", where = mask)
    last_diff = float("infinity")

    for iter_nr in range(10000):#max number of iterations
        prev = field.copy() #make a copy of the field at the start of the iteration (python always stores pointers unless you explicitly copy something)

        for d in range(dim): #can be scaled to arbitrary dimensions, using 2D for testing

            #these 2 blocks are hard to follow but they work, read the comments
            front = prev[tuple(0 if i==d else slice(None) for i in range(dim))] #select front face of cube/whatever
            front = front[tuple(numpy.newaxis if i==d else slice(None) for i in range(dim))] #prepare it for next step
            front = numpy.concatenate((front,prev),d) #add it the previous iteration's result
            front = front[tuple(slice(-1) if i==d else slice(None) for i in range(dim))] #remove the back side of the previous iteration's result
            #we now have the volume shifted right by 1 pixel, x now corresponds to the x-1 term

            back = prev[tuple(-1 if i==d else slice(None) for i in range(dim))] #select back face of cube/whatever
            back = back[tuple(numpy.newaxis if i==d else slice(None) for i in range(dim))] #prepare it for next step
            back = numpy.concatenate((prev,back),d) #add it the previous iteration's result
            back = back[tuple(slice(1,None) if i==d else slice(None) for i in range(dim))] #remove the front side of the previous iteration's result
            #we now have the volume shifted left by 1 pixel, x now corresponds to the x+1 term

            field += (front + back) * alpha/(2*dim) #this part of the formula: alpha*( ((Vm(x-1,y,...) + Vm(x+1,y,...) + Vm(x,y-1,...) + Vm(x,y+1,...))/(2*nr dimensions)
            #numpy.copyto(field, boundary, casting = "safe", where = mask)

        field -= alpha*prev #this part of the formula: (1-alpha)*Vm(x,y,...)
        #reset values at boundaries
        numpy.copyto(field, boundary, casting = "safe", where = mask) 

        #check if the difference is less than threshold
        average = math.sqrt(numpy.average(field**2)) #sqrt of average of squares, just so i get a positive number
        diff = math.sqrt(numpy.average((field-prev)**2)) #standard deviation

        if last_diff < diff/average:
            print("Solution is diverging.")
            break

        if diff/average < threshold:
            print("Found solution after", iter_nr,"iteratiorn.")
            break

        last_diff = diff/average

    if file is not None:
        numpy.save(file,field)
    return field



def solve_laplace_slow_2D(boundary, mask, file = None, alpha = 1.9,threshold = 0.0001):
    """
    We are using the successive over-relaxation method. We iterate until our solution changes less than some threshold value.

    Vm+1(x,y,...) = alpha*( ((Vm(x-1,y,...) + Vm(x+1,y,...) + Vm(x,y-1,...) + Vm(x,y+1,...) + ...)/(2*nr dimensions) ) + (1-alpha)*Vm(x,y,...)
    """

    assert boundary.ndim == 2

    field = numpy.zeros_like(boundary)
    numpy.copyto(field, boundary, casting = "safe", where = mask) 
    last_diff = float("infinity")
    start_time = time.time()

    for iter_nr in range(10000):#max number of iterations
        prev = field.copy()
        for y in range(field.shape[0]):
            for x in range(field.shape[1]):
                if not mask[y,x]:
                    field[y,x] = (1-alpha)*field[y,x] + (field[max(y-1,0),x] + field[min(y+1,field.shape[0]-1),x] + field[y,max(x-1,0)] + field[y,min(x+1,field.shape[1]-1)]) * alpha/4

        #check if the difference is less than threshold
        average = math.sqrt(numpy.average(field**2)) #sqrt of average of squares, just so i get a positive number
        diff = math.sqrt(numpy.average((field-prev)**2)) #standard deviation

        if last_diff < diff/average:
            print("Solution is diverging.")
            break

        if diff/average < threshold:
            print("Found solution after the", iter_nr,"iteratiorn.")
            break

        if time.time() - start_time > 3600:
            print("Completed in an hour time at iteration:", iter_nr)
            break

        last_diff = diff/average

        #print(time.time() - start_time, iter_nr, last_diff)

    if file is not None:
        numpy.save(file,field)
    return field

def test():
    boundary = numpy.zeros((51,51))
    boundary[25,25] = 1
    for i in range(51):
        boundary[0,i] = -1
        boundary[50,i] = -1
        boundary[i,0] = -1
        boundary[i,50] = -1
    mask = (boundary != 0)

    print("Trying fast method:")
    solve_laplace(boundary,mask,alpha = 1.5) #diverges
    print("Trying slow method:")
    solve_laplace_slow_2D(boundary,mask,alpha = 1.5) #converges but is very slow
2个回答

“快速”版本是(阻尼/过度松弛)Jacobi,而不是 SOR,这是一种乘法方法。即使阻尼因子为 1.0,Jacobi 也不能保证收敛,正如您将其应用于一维问题或将极端各向异性添加到多维问题中所看到的那样。

您的 SOR(“慢”)实现很慢,因为它是用纯 Python 编写的,而不是因为内存访问。实际上,矢量化代码访问内存的次数更多,因此在内存带宽方面实际上更差。但是解释器的速度要慢得多,以至于它隐藏了带宽效应。

这是一个不争的事实,在没有 JIT 的情况下用动态语言编写数值内核会降低性能,其可变性可能与机器特性无关。您应该能够使用CythonNumba加速循环代码,或者只用编译语言编写它。

现在确实可以将不同类型的局部矢量化应用于 Jacobi,但不能应用于 SOR。这可以通过使用多项式加速,例如 Chebyshev(对于已知的谱界)或使用 Krylov 方法(谱自适应)来部分解决,特别是对于 SPD 问题。或者,您可以使用多色松弛(例如,“红黑 Gauss-Seidel”)来获得具有相对快速保证收敛的方法。结合到多重网格算法中也将大大加快你的收敛速度(到与问题大小无关的少量恒定迭代次数)。

第二种算法看起来确实像 Jacobi 迭代。在这种情况下,您应该尝试使用小于 1 的松弛系数(根据维基百科关于该主题的条目,2/3 是通常规定的值)。

如果您想使用 Jacobi 迭代,计划的放松可能会有所帮助。(Yang, Xiang;Mittal, Rajat(2014 年 6 月 27 日)。“Jacobi 迭代法的加速因子超过 100 使用预定松弛”。计算物理学杂志。doi:10.1016/j.jcp.2014.06.010)本质上,它涉及使用具有相应规定的迭代次数的多个阻尼系数来形成更大的循环。该论文提供了几个开箱即用的放松时间表,因此应该很容易以最少的开销快速运行......就个人而言,我会选择共轭梯度。对我来说似乎更快。

我已经测试了 SRJ 方法(使用基于 csc_matrix 的笨重实现)来求解具有周期边界的泊松方程的有限差分离散化,系统的周期边界为 512x512(2D)和 64x64x64(3D)。SRJ 似乎收敛的总迭代次数约为通常建议的 2/3 奇异过松弛系数所需的总迭代次数的 1/5...

但是,共轭梯度具有更好的收敛特性,即使是无条件的。

Geometric Multigrid 和 Incomplete Cholesky preconditioning CG 也绝对是不错的选择,特别是对于更高维度的问题,并且似乎提供了迄今为止我所见过的迭代线性求解器的最佳性能(尽管我仍在努力与精益代数进行比较和组合多重网格)。

如果您只是用空间常数系数求解拉普拉斯方程,则基于傅立叶变换的方法(通过 fftw)非常快。