Andreas Ernst
2007-01-25 09:49:57 UTC
Hi, all:
I obtain the following error message with my parallel code on a Cray
XD1 system:
[0] Fatal Error: message truncated. ask 304 got 380
mpiexec: Error: read_rai_startup_ports: Failed to read barrier entry
token from rank 0 process on c414n5.
at line 863 in file /tmp/igorodet/rpm/BUILD/mpich-1.2.6/mpid/rai/raifma.c
***@c414n5:~/num/Nbody6GC/Runtid4> Process 1 lost connection: exiting
Process 3 lost connection: exiting
Process 5 lost connection: exiting
Process 4 lost connection: exiting
Process 2 lost connection: exiting
The first error message ("message truncated") means that in an MPI_SENDRECV
call on rank x the "sendcount" to rank y and in the corresponding
MPI_SENDRECV
call on rank y the "recvcount" from rank x have not the
same value.
The problem is that although I have used an MPI_BARRIER, rank 0 is no
longer synchronized with the other ranks. It just seems as if the MPI_BARRIER
does not work only on rank 0. It only works on the other ranks. Rank 0 does
something on his own! This is also indicated by the second error message.
How can I fix such an error? Or where could I get help? I have found no
documentation of such an error in the web.
Cheers,
Andreas
I obtain the following error message with my parallel code on a Cray
XD1 system:
[0] Fatal Error: message truncated. ask 304 got 380
mpiexec: Error: read_rai_startup_ports: Failed to read barrier entry
token from rank 0 process on c414n5.
at line 863 in file /tmp/igorodet/rpm/BUILD/mpich-1.2.6/mpid/rai/raifma.c
***@c414n5:~/num/Nbody6GC/Runtid4> Process 1 lost connection: exiting
Process 3 lost connection: exiting
Process 5 lost connection: exiting
Process 4 lost connection: exiting
Process 2 lost connection: exiting
The first error message ("message truncated") means that in an MPI_SENDRECV
call on rank x the "sendcount" to rank y and in the corresponding
MPI_SENDRECV
call on rank y the "recvcount" from rank x have not the
same value.
The problem is that although I have used an MPI_BARRIER, rank 0 is no
longer synchronized with the other ranks. It just seems as if the MPI_BARRIER
does not work only on rank 0. It only works on the other ranks. Rank 0 does
something on his own! This is also indicated by the second error message.
How can I fix such an error? Or where could I get help? I have found no
documentation of such an error in the web.
Cheers,
Andreas