Helen
2008-02-07 16:45:52 UTC
I'm experimenting MPI's response to a sudden network connection loss
(both MPICH2 from Argonne Lab and Intel). Even during a
"sleep(20000)", when I unplugged the network cable, MPI detects it and
gives a fatal abort (mpiexec.exe got abort). I cannot catch any
exceptions. The error messages that appear in my command window are:
op_read error on left context: generic socket failure, error stack:
MPIDU_Sock_wait(2571): The specified network name is no longer
available. (errno
64)
unable to read the cmd header on the left context, generic socket
failure, error
stack:
MPIDU_Sock_wait(2571): The specified network name is no longer
available. (errno
64).
I spent hours on the web and haven't found any related information
yet. It seems that error handling is not very well defined in MPI.
Even the errorhandler, seems only handle MPI_XXX calls rather than
MPIDU_xxxxx
Does anybody knows about it?
Thanks a lot!
Helen
(both MPICH2 from Argonne Lab and Intel). Even during a
"sleep(20000)", when I unplugged the network cable, MPI detects it and
gives a fatal abort (mpiexec.exe got abort). I cannot catch any
exceptions. The error messages that appear in my command window are:
op_read error on left context: generic socket failure, error stack:
MPIDU_Sock_wait(2571): The specified network name is no longer
available. (errno
64)
unable to read the cmd header on the left context, generic socket
failure, error
stack:
MPIDU_Sock_wait(2571): The specified network name is no longer
available. (errno
64).
I spent hours on the web and haven't found any related information
yet. It seems that error handling is not very well defined in MPI.
Even the errorhandler, seems only handle MPI_XXX calls rather than
MPIDU_xxxxx
Does anybody knows about it?
Thanks a lot!
Helen