Discussion:
Program runs fine but MPI_Finalise crashes: Possible general causes?
(too old to reply)
k***@gmail.com
2008-03-25 10:02:04 UTC
Permalink
Dear All,

I'm developing a Fortran program that uses OpenMPI 1.2.5. As far as
I'm concerned, the program runs fine, except for MPI_Finalise who
crashes 50% of the time I run the program.

A typical output is as follows:

|  Initialising MPI
| Calculations started
| Calculations ended
| Finalizing MPI
| [...] *** Process received signal ***
| [...] Signal: Segmentation fault (11)
| [...] Signal code: Address not mapped (1)
| [...] Failing at address: 0x47452104
| [ 1] [0xbfffe748, 0x47452104] (-P-)
| [ 2] (ompi_osc_pt2pt_component_finalize + 0xa5) [0xbfffe788,
0x00771c75]
| [ 3] (ompi_osc_base_finalize + 0x54) [0xbfffe7a8, 0x00256834]
| [ 4] (ompi_mpi_finalize + 0x17c) [0xbfffe8e8, 0x001f811c]
| [ 5] (mpi_finalize_ + 0xb) [0xbfffe8f8, 0x001800ab]
| ...
| [...] *** End of error message ***

I've been thinking about what might cause such an error and I came up
with the following ideas:

- asynchronious send/receive that hasn't completed

- memory used as buffer for non-completed asynchronious send/receive
that is deallocated before the operation is completed

Are there any other things one should pay attention to to avoid this
kind of finalisation errors or ways to debug this kind of behaviour?

Kind Greetings

Koen Poppe
Greg Lindahl
2008-03-28 01:48:15 UTC
Permalink
Post by k***@gmail.com
I've been thinking about what might cause such an error and I came up
another possibility: it's crashing in a free() or deallocate because
you have written beyond the beginning or end of a malloc()ed or
allocated array.

-- greg
k***@gmail.com
2008-03-28 13:21:21 UTC
Permalink
Correct, apparently I've miscalculated some indexes in the
calculation.
(even though bounds-checking was on...) Fixing this solved the
problem.

Kind Greetings

Koen Poppe

s***@gmail.com
2008-03-28 09:20:05 UTC
Permalink
Post by k***@gmail.com
Dear All,
I'm developing a Fortran program that uses OpenMPI 1.2.5. As far as
I'm concerned, the program runs fine, except for MPI_Finalise who
crashes 50% of the time I run the program.
| Initialising MPI
| Calculations started
| Calculations ended
| Finalizing MPI
| [...] *** Process received signal ***
| [...] Signal: Segmentation fault (11)
| [...] Signal code: Address not mapped (1)
| [...] Failing at address: 0x47452104
| [ 1] [0xbfffe748, 0x47452104] (-P-)
| [ 2] (ompi_osc_pt2pt_component_finalize + 0xa5) [0xbfffe788,
0x00771c75]
| [ 3] (ompi_osc_base_finalize + 0x54) [0xbfffe7a8, 0x00256834]
| [ 4] (ompi_mpi_finalize + 0x17c) [0xbfffe8e8, 0x001f811c]
| [ 5] (mpi_finalize_ + 0xb) [0xbfffe8f8, 0x001800ab]
| ...
| [...] *** End of error message ***
I've been thinking about what might cause such an error and I came up
- asynchronious send/receive that hasn't completed
- memory used as buffer for non-completed asynchronious send/receive
that is deallocated before the operation is completed
Are there any other things one should pay attention to to avoid this
kind of finalisation errors or ways to debug this kind of behaviour?
Kind Greetings
Koen Poppe
Hi.

Could you send the actual code, wish to check.

BR
Loading...