Discussion:
Failed to read barrier entry token from rank 0
(too old to reply)
Andreas Ernst
2007-01-25 09:49:57 UTC
Permalink
Hi, all:

I obtain the following error message with my parallel code on a Cray
XD1 system:

[0] Fatal Error: message truncated. ask 304 got 380
mpiexec: Error: read_rai_startup_ports: Failed to read barrier entry
token from rank 0 process on c414n5.
at line 863 in file /tmp/igorodet/rpm/BUILD/mpich-1.2.6/mpid/rai/raifma.c
***@c414n5:~/num/Nbody6GC/Runtid4> Process 1 lost connection: exiting
Process 3 lost connection: exiting
Process 5 lost connection: exiting
Process 4 lost connection: exiting
Process 2 lost connection: exiting

The first error message ("message truncated") means that in an MPI_SENDRECV
call on rank x the "sendcount" to rank y and in the corresponding
MPI_SENDRECV
call on rank y the "recvcount" from rank x have not the
same value.

The problem is that although I have used an MPI_BARRIER, rank 0 is no
longer synchronized with the other ranks. It just seems as if the MPI_BARRIER
does not work only on rank 0. It only works on the other ranks. Rank 0 does
something on his own! This is also indicated by the second error message.

How can I fix such an error? Or where could I get help? I have found no
documentation of such an error in the web.

Cheers,

Andreas
Greg Lindahl
2007-01-25 20:03:14 UTC
Permalink
Post by Andreas Ernst
I obtain the following error message with my parallel code on a Cray
Did you try tech support at Cray?

-- greg

Loading...