Discussion:
MPI_ALLtoALL
(too old to reply)
Patrick Begou
2011-07-08 13:55:15 UTC
Permalink
I think I want to do a very classical thing but I've yet spent a couple of days
on this problem and I'm still unable to solve it in the most general case. My
idea is that a user defined data type and a global communication with
MPI_AlltoAll or MPI_AlltoAllV should do the job but...

On the attached image I've presented what I try to build:
On this scheme with 6 process data are stored in large "pencils" along the X
axis (left part of the figure). I want to set them along the Y axis (right part
of the figure).
I've created 2 communicators for processes [0,1,2] and for processes [3,4,5]
because I want to try to do this with global communications.

However, All processes should not have the same amount of datas! And this is my
problem!

Suppose the grid (nx,ny,nz) is nx=7, ny=5, nz=4:
when "along X":
proc 0 has (1:7,1:2,1:2) = 28 items
proc 1 has (1:7,3:4,1:2) = 28 items
proc 2 has (1:7,5:5,1:2) = 14 items

And when "along Y", procs of this communicator have:
proc0 will have (1:3,1:5,1:2) = 30 items
proc1 will have (4:5,1:5,1:2) = 20 items
proc2 will have (6:7,1:5,1:2) = 20 items

I'm sure this is a very classical problem but I'm unable to find any algorithm
or starting documentation and I do not understand how to do this with global
communications. My application is in Fortran.

Thanks for your advices.

Patrick
Omar Awile
2011-07-27 21:28:37 UTC
Permalink
Hi Patrick,

(This comes a bit late, but maybe it'll still be useful for you.)
How about you determine the intersection of the x-pencils and y-pencils. This should result in a cuboid decomposition of your array (overlay the left part of your image with the right one). This should be done by all processors. What you get is a map which tells you for each processor which part of its data must go to which processor. I would do then several rounds of point-to-point communications between the different processors.

best,

Omar
Patrick Bégou
2011-08-05 11:47:58 UTC
Permalink
Post by Omar Awile
Hi Patrick,
(This comes a bit late, but maybe it'll still be useful for you.)
How about you determine the intersection of the x-pencils and y-pencils. This should result in a cuboid decomposition of your array (overlay the left part of your image with the right one). This should be done by all processors. What you get is a map which tells you for each processor which part of its data must go to which processor. I would do then several rounds of point-to-point communications between the different processors.
best,
Omar
Thanks for your answer Omar,
I was looking for a solution with mpi_alltoall because I was thinking
that this procedure is optimized on the runing architecture but I was
unable to imagine a right algorithm for the most generic cases.

I've build a solution based on MPI_Sendrecv and communicators wich is
working. This data layout modification (previously the code was using
"slices of grid") has large side effect in the code and, if I have
validated the communication mecanism for various grid sizes and numbers
of cpus, I've not tested the code on very large configuration (from 512
up to several thousand of processes). I just hope my solution will be
quite scalable...

Patrick

Loading...