Discussion:
Problem with MPI_Bcast
(too old to reply)
John
2010-01-26 12:31:00 UTC
Permalink
I am a newbie to MPI and I am trying to parallelize a code which has
been written in Fortran 90. I am using mpif90 to compile the code. I
invoked MPI_Bcast statements in many do loops as follows:

integer :: my_rank,new,jim,ierr
logical, dimension(bead_max) :: who_are_you

bead_max=400000

do k=1,jim
IF(my_rank.eq.0)THEN
call GETNAME(new);
call MPI_Bcast(new,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierr)
call MPI_Bcast(who_are_you,bead_max,MPI_LOGICAL,
0,MPI_COMM_WORLD,ierr)
ELSE IF(my_rank.gt.0)THEN
call MPI_Bcast(new,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierr)
call MPI_Bcast(who_are_you,bead_max,MPI_LOGICAL,
0,MPI_COMM_WORLD,ierr)
ENDIF;
enddo

When I execute the program using mpirun, I get the following error
messages.
Can anybody please help me? My question would also be whether it is
possible to use MPI_Bcast in do loops as above? Any help is greatly
appreciated. Here are the errors that I get:

libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
This will severely limit memory registrations.
--------------------------------------------------------------------------
[0,1,0]: OpenIB on host mgt was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
This will severely limit memory registrations.
--------------------------------------------------------------------------
[0,1,1]: OpenIB on host mgt was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
This will severely limit memory registrations.
--------------------------------------------------------------------------
[0,1,2]: OpenIB on host mgt was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
This will severely limit memory registrations.
--------------------------------------------------------------------------
[0,1,3]: OpenIB on host mgt was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------

The program runs then a little bit, but then mpirun kills the code
with the
error messages below.

[mgt:14192] *** An error occured in MPI_Bcast
[mgt:14192] *** On communicator MPI_COMM_WORLD
[mgt:14192] *** MPI_ERR_TRUNCATE: message truncated
[mgt:14192] *** MPI_ERRORS_ARE_FATAL (goodbye)

and then

[mgt:14188] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/
pls_base_orted_cmds.c at line 275
[mgt:14188] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line
1166
[mgt:14188] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at
line 90

mpirun was unable to cleanly terminate the daemons for this job.
Returned value Timeout instead of ORTE_SUCCESS.
b***@myrealbox.com
2010-01-27 12:36:34 UTC
Permalink
Post by John
I am a newbie to MPI and I am trying to parallelize a code which has
been written in Fortran 90. I am using mpif90 to compile the code. I
integer :: my_rank,new,jim,ierr
logical, dimension(bead_max) :: who_are_you
bead_max=400000
do k=1,jim
IF(my_rank.eq.0)THEN
call GETNAME(new);
call MPI_Bcast(new,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierr)
call MPI_Bcast(who_are_you,bead_max,MPI_LOGICAL,
0,MPI_COMM_WORLD,ierr)
ELSE IF(my_rank.gt.0)THEN
call MPI_Bcast(new,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierr)
call MPI_Bcast(who_are_you,bead_max,MPI_LOGICAL,
0,MPI_COMM_WORLD,ierr)
ENDIF;
enddo
I know C better than Fortran, but that aside, I'm not spotting
anything obviously wrong with the above. (It does seem to me like
you could simplify the code a bit -- aren't you doing the same thing
when rank is 0 and when it's not, with the exception of the call to
GETNAME?)

I made a few changes to get something that would compile (complete
program below), and the result works on a Fedora Linux system with
OpenMPI. To me that says that maybe the code is okay, but there's
something not right about how things are installed on your system?
Post by John
When I execute the program using mpirun, I get the following error
messages.
Can anybody please help me? My question would also be whether it is
possible to use MPI_Bcast in do loops as above? Any help is greatly
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
This will severely limit memory registrations.
--------------------------------------------------------------------------
[0,1,0]: OpenIB on host mgt was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
Have you researched these error/warning messages? I can't tell you
off the top of my head what either one means, but they look like
something one would want to investigate. Do simpler programs work?

[ snip -- looks like all processes produce these errors/warnings? ]
Post by John
The program runs then a little bit, but then mpirun kills the code
with the
error messages below.
[mgt:14192] *** An error occured in MPI_Bcast
[mgt:14192] *** On communicator MPI_COMM_WORLD
[mgt:14192] *** MPI_ERR_TRUNCATE: message truncated
Could the problem be related to the amount of data you're trying
to broadcast? The second call to MPI_Bcast seems to be using an
array of logicals that's dimensioned 400000. Have you tried the
program with a smaller array?
Post by John
[mgt:14192] *** MPI_ERRORS_ARE_FATAL (goodbye)
and then
[mgt:14188] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/
pls_base_orted_cmds.c at line 275
[mgt:14188] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line
1166
[mgt:14188] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at
line 90
mpirun was unable to cleanly terminate the daemons for this job.
Returned value Timeout instead of ORTE_SUCCESS.
Other than that, no suggestions.

My test program:

program temp
include "mpif.h"
integer :: my_rank,new,jim,ierr,bead_max
parameter(bead_max=400000)
logical, dimension(bead_max) :: who_are_you

call MPI_INIT( ierr )
call MPI_COMM_RANK ( MPI_COMM_WORLD, my_rank, ierr)
call MPI_COMM_SIZE ( MPI_COMM_WORLD, Nprocs, ierr )

jim=10

do k=1,jim
IF(my_rank.eq.0)THEN
! call GETNAME(new)
new=k*10
print*, new
ENDIF
call MPI_Bcast(new,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierr)
call MPI_Bcast(who_are_you,bead_max,MPI_LOGICAL,0,MPI_COMM_WORLD,ierr)
enddo
print*, "final value of new in ", my_rank, " is ", new

call MPI_FINALIZE( ierr )
end
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
Harald Anlauf
2010-02-18 20:25:23 UTC
Permalink
Post by John
When I execute the program using mpirun, I get the following error
messages.
Can anybody please help me? My question would also be whether it is
possible to use MPI_Bcast in do loops as above? Any help is greatly
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
You may need to increase the size of locked memory.
Try "ulimit -l" (for bash/ksh) and choose a larger value.
If the hard limit ("ulimit -Hl") severely restricts you,
you may need to contact your system administrator.
Post by John
--------------------------------------------------------------------------
[0,1,0]: OpenIB on host mgt was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
I am not sure about this one.
Maybe the wrong driver, or lacking permissions to access
the hardware?
Post by John
The program runs then a little bit, but then mpirun kills the code
with the
error messages below.
[mgt:14192] *** An error occured in MPI_Bcast
[mgt:14192] *** On communicator MPI_COMM_WORLD
[mgt:14192] *** MPI_ERR_TRUNCATE: message truncated
[mgt:14192] *** MPI_ERRORS_ARE_FATAL (goodbye)
and then
[mgt:14188] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/
pls_base_orted_cmds.c at line 275
[mgt:14188] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line
1166
[mgt:14188] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at
line 90
mpirun was unable to cleanly terminate the daemons for this job.
Returned value Timeout instead of ORTE_SUCCESS.
Don't know about this one. Maybe another ressource problem?
Does your code run on a single node with only shared memory?

Loading...