k***@gmail.com
2008-03-25 10:02:04 UTC
Dear All,
I'm developing a Fortran program that uses OpenMPI 1.2.5. As far as
I'm concerned, the program runs fine, except for MPI_Finalise who
crashes 50% of the time I run the program.
A typical output is as follows:
| Initialising MPI
| Calculations started
| Calculations ended
| Finalizing MPI
| [...] *** Process received signal ***
| [...] Signal: Segmentation fault (11)
| [...] Signal code: Address not mapped (1)
| [...] Failing at address: 0x47452104
| [ 1] [0xbfffe748, 0x47452104] (-P-)
| [ 2] (ompi_osc_pt2pt_component_finalize + 0xa5) [0xbfffe788,
0x00771c75]
| [ 3] (ompi_osc_base_finalize + 0x54) [0xbfffe7a8, 0x00256834]
| [ 4] (ompi_mpi_finalize + 0x17c) [0xbfffe8e8, 0x001f811c]
| [ 5] (mpi_finalize_ + 0xb) [0xbfffe8f8, 0x001800ab]
| ...
| [...] *** End of error message ***
I've been thinking about what might cause such an error and I came up
with the following ideas:
- asynchronious send/receive that hasn't completed
- memory used as buffer for non-completed asynchronious send/receive
that is deallocated before the operation is completed
Are there any other things one should pay attention to to avoid this
kind of finalisation errors or ways to debug this kind of behaviour?
Kind Greetings
Koen Poppe
I'm developing a Fortran program that uses OpenMPI 1.2.5. As far as
I'm concerned, the program runs fine, except for MPI_Finalise who
crashes 50% of the time I run the program.
A typical output is as follows:
| Initialising MPI
| Calculations started
| Calculations ended
| Finalizing MPI
| [...] *** Process received signal ***
| [...] Signal: Segmentation fault (11)
| [...] Signal code: Address not mapped (1)
| [...] Failing at address: 0x47452104
| [ 1] [0xbfffe748, 0x47452104] (-P-)
| [ 2] (ompi_osc_pt2pt_component_finalize + 0xa5) [0xbfffe788,
0x00771c75]
| [ 3] (ompi_osc_base_finalize + 0x54) [0xbfffe7a8, 0x00256834]
| [ 4] (ompi_mpi_finalize + 0x17c) [0xbfffe8e8, 0x001f811c]
| [ 5] (mpi_finalize_ + 0xb) [0xbfffe8f8, 0x001800ab]
| ...
| [...] *** End of error message ***
I've been thinking about what might cause such an error and I came up
with the following ideas:
- asynchronious send/receive that hasn't completed
- memory used as buffer for non-completed asynchronious send/receive
that is deallocated before the operation is completed
Are there any other things one should pay attention to to avoid this
kind of finalisation errors or ways to debug this kind of behaviour?
Kind Greetings
Koen Poppe