这最初是在 Stackoverflow 上提出的,但建议我在此处重新发布并删除它,而不是迁移它。我试图根据那里的评论进行澄清。
我是 MPI 的新手,但我正在尝试使用它来加速我拥有的一些代码。该代码的最小版本如下:
program main
!! takes a list, then for each element randomly generates an index and adds
!! the element to that location
!! while this program is useless the basic features are the same as a Monte
!! Carlo program I am writing
integer, parameter :: N=5
integer, parameter :: niter=10
integer :: arr(N), arr_new(N) ! will want dp real
! dummy
integer :: i,j,step
do i=1,N
arr(i) = i
enddo
do step=1,niter
arr_new = 0 ! initialise to zero
do i=1,N
j = randint_exc(1,N,i)
arr_new(j) = arr_new(j) + i
enddo
arr = arr + arr_new
print*, "newarr", arr_new
print*, "uptarr", arr
enddo
contains
function randint_exc(a, b, exclude) result(retval)
!! get random integer between a and b, but exclude arg exclude
implicit none
integer, parameter :: dp = kind(1.d0)
integer, intent(in) :: a, b
integer, intent(in) :: exclude
integer :: retval
real(dp) :: u
call random_number(u)
retval = a + floor((b-a)*u) ! randint between a and b-1
if (retval >= exclude) then
retval = retval + 1
endif
end function randint_exc
end program main
(FWIW 我正在并行化我自己的 FCIQMC 实现,只是为了好玩;我知道那里有很好的程序。我想我会在这里简化它,这样你就不必担心细节)
基本上,我有一些值数组(我知道它的初始值),对于这个数组的每个元素,我想随机选择数组中的另一个元素,并将当前元素添加到其中。然后我对一些固定数量的迭代执行此操作。如您所见,我这样做的方法是将一个新数组初始化为零并向其中添加值,然后将该新数组添加到原始数组中。冲洗并重复。
我尝试将它与 MPI 并行化是让每个进程生成自己的数组,但我被困在它可能在自己的块之外生成元素的部分。我想我必须检查进程索引j属于哪个等级,然后将索引连同值一起发送到进程并接收(任意计数)。我一直在努力使用MPI_SendandMPI_Recv为此(我的尝试甚至还没有编译)。我将如何做到这一点,有没有更优雅/更简单的方法呢?(还有关于阻塞到部分;是否有内置的 MPI 函数?)这是我的尝试,! TODO ???我在评论中卡住了......否则,有没有办法让所有 MPI 进程共享内存到新数组,这样我就可以随时发送到任意索引?
program main
use mpi
implicit none
! MPI variables
integer :: ierr, nproc, rank
integer :: N=5
integer, parameter :: niter=10
! variables introduced because I'm trying to move to MPI
integer :: Nlocal, r
integer, allocatable :: arrlocal(:), arr_newlocal(:)
! dummy
integer :: i,j,step
call MPI_Init(ierr)
call MPI_Comm_size(MPI_COMM_WORLD, nproc, ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
Nlocal = N/nproc
if (rank == nproc-1) then
! add remaining elements to last processor's list
r = modulo(N,nproc)
else
r = 0
endif
allocate(arrlocal(Nlocal+r), arr_newlocal(Nlocal+r))
do i=1,Nlocal+r
arrlocal(i) = Nlocal*rank+i
enddo
print*, rank, Nlocal, "array", arrlocal
do step=1,niter
! NOTE you can only start the next step when all the other processes are
! done (I think), since it will depend on the new full array
! so force all the processes to reach this point
call MPI_Barrier(MPI_COMM_WORLD, ierr)
arr_newlocal = 0 ! initialise to zero
do i=1,N
! this is the part I am most confused about parallelising
j = randint_exc(1,N,i) ! NOTE N, *not* Nlocal
! TODO ???
! j might be outside the scope of this process
! arr_newlocal(j) = arr_newlocal(j) + i
enddo
arrlocal = arrlocal + arr_newlocal
! print*, step, rank, "newarr", arr_newlocal
! print*, step, rank, "uptarr", arrlocal
enddo
call MPI_Finalize(ierr)
contains
function randint_exc(a, b, exclude) result(retval)
!! get random integer between a and b, but exclude arg exclude
implicit none
integer, parameter :: dp = kind(1.d0)
integer, intent(in) :: a, b
integer, intent(in) :: exclude
integer :: retval
real(dp) :: u
call random_number(u)
retval = a + floor((b-a)*u) ! randint between a and b-1
if (retval >= exclude) then
retval = retval + 1
endif
end function randint_exc
end program main
一些评论听起来好像这比我意识到的更难并行化,所以我很乐意接受一个基本上只是“使用 X 模式和 Y 方法”的参考的答案。我也对中间步骤感兴趣,我想去非常大的数组(所以它是内存密集型的),这更像是一个编程练习而不是其他任何东西,所以我想并行化算法本身而不是解决相同的算法多次并行并从那里获取统计信息。