Discussion:
Graceful exceptions in MPI
(too old to reply)
c***@my-deja.com
2009-02-13 00:47:15 UTC
Permalink
I have a C++ code that is supposed to throw exceptions on error and
give back control to the caller along with an error object. However,
if the code is run in parallel with MPI, and if one process happens to
throw such an exception, the others hang or worse in some collective
call down the line. What I want to be able to do is make the other
processes give control back to the user as well. It does not matter
even if they simply throw a simple object that says "Process of rank x
had an error", and then the rank x, which originally had the
exception, has the actual error object. (It is not a concern that the
program hasnt cleaned up properly etc, eg. MPI_Finalize hasnt been
called). Does MPI have support for this kind of thing?

Note that this is different from MPI functions' error handling. These
errors are my own, not MPI functions'. The error objects thrown are my
own as well. Hopefully someone else has run into a similar problem and
can help. Thanks,
chengiz
Michael Hofmann
2009-02-16 09:34:02 UTC
Permalink
Post by c***@my-deja.com
I have a C++ code that is supposed to throw exceptions on error and
give back control to the caller along with an error object. However,
if the code is run in parallel with MPI, and if one process happens to
throw such an exception, the others hang or worse in some collective
call down the line. What I want to be able to do is make the other
processes give control back to the user as well. It does not matter
even if they simply throw a simple object that says "Process of rank x
had an error", and then the rank x, which originally had the
exception, has the actual error object. (It is not a concern that the
program hasnt cleaned up properly etc, eg. MPI_Finalize hasnt been
called). Does MPI have support for this kind of thing?
No, this kind of error handling is not a part of MPI. There is no way to
cancel an already called collective operation.


Michael
c***@my-deja.com
2009-03-27 18:58:36 UTC
Permalink
Post by Michael Hofmann
No, this kind of error handling is not a part of MPI. There is no way to
cancel an already called collective operation.
Michael,
Sorry for the delay. It isnt really related to an already called
collective operation, I was looking for a way for the process with the
error to indicate to the other processes that they too should return
to the caller (rather than hang or crash down the line). This would
have to be asynchronous, something like what MPI_Abort does, except
instead of abort, you're telling the other processes to throw an
exception.

Loading...