barrier with a tag?

Discussion:

barrier with a tag?

(too old to reply)

j***@gmail.com

2008-10-02 09:40:42 UTC

Hi!

I am debugging an MPI application which makes use of MPI_Barrier()
in a manner similar to this:

do {
MPI_Barrier(MPI_COMM_WORLD);
int result = f();
MPI_Barrier(MPI_COMM_WORLD); // ***
} while(...);

Unfortunately, the function f(), depending on complex conditions,
sometimes enters code branches that call MPI_Barrier() themselves.
Sometimes I wind up with a situation in which all the processors wait
on the barrier marked *** above, while one processor, still in the
computation of f(), hits a barrier that is absolutely unrelated to the
barrier marked ***. Then all sorts of weird things happen, as the
other processors go past the *** barrier, where they should wait for
that other processor. Instead, I would like a deadlock to happen, so
that I could debug the code.

Is there any way I can simulate a "barrier with a tag", so that
different barriers would not pair-up?

Please don't tell me the code design is flawed -- I know that,
that's why I'm trying to debug it, just it would be much easier if I
could equip barriers with a tag so that different barriers would not
pair up.

TIA,
- J.

Georg Bisseling

2008-10-02 12:03:22 UTC

Permalink

You can use another communicator to separate the barriers.
The most simple way to get another communicator is to call
MPI_Comm_dup on MPI_COMM_WORLD.

But if the problem is that only a subgroup of your processes
goes into that conditional barrier that you mentioned then
this simple solution woul not work.

In this more complicated scenario you would need a communicator
that contains exactly the processes that went into the branch
containing the conditional barrier.

If you agree with me up to this point then it might be enough for
you to look up the documentation for MPI_Comm_split to get on track.

Keep in mind that the call to MPI_Comm_split is collective, so ALL
processes have to take part in creating the required subcommunicator.

Please get back to us, if you need more assistance.

--
This signature intentionally left almost blank.
http://www.this-page-intentionally-left-blank.org/

Michael Hofmann

2008-10-06 08:33:05 UTC

Permalink

Post by j***@gmail.com
Hi!
I am debugging an MPI application which makes use of MPI_Barrier()
do {
MPI_Barrier(MPI_COMM_WORLD);
int result = f();
MPI_Barrier(MPI_COMM_WORLD); // ***
} while(...);
Unfortunately, the function f(), depending on complex conditions,
sometimes enters code branches that call MPI_Barrier() themselves.
Sometimes I wind up with a situation in which all the processors wait
on the barrier marked *** above, while one processor, still in the
computation of f(), hits a barrier that is absolutely unrelated to the
barrier marked ***. Then all sorts of weird things happen, as the
other processors go past the *** barrier, where they should wait for
that other processor. Instead, I would like a deadlock to happen, so
that I could debug the code.

You can replace the *** MPI_Barrier with a collective call that acts like
an barrier (MPI_Allreduce, MPI_Allgather, MPI_Alltoall, ... try one that
is not used in "f()"). This kind of "extraordinary" barrier doesn't
pair-up with the ordinary MPI_Barrier in "f()".

Post by j***@gmail.com
Is there any way I can simulate a "barrier with a tag", so that
different barriers would not pair-up?

You can build your own "barrier with a tag" using point-to-point
communication. A simple barrier algorithm (centralized + synchronous
sends) should be sufficient, since efficiency doesn't not matter for
debugging.(?)

Michael