Discussion:
Yet again, but different: MPI program stalls
(too old to reply)
Ovidiu Gheorghies
2008-06-17 10:48:24 UTC
Permalink
Hello all,

I have installed from sources the latest OpenMPI on an Intel Quad-Core
running 64-bit Fedora 7 and I have a new problem with another master/
slave program (listed below).

The "master" sends repeatedly a 13-byte string ('\0' included) to a
"slave" (a MPI_Send/MPI_Recv pair is used), and the slave performs
some idle logic on the received string (i.e strlen).

At ~30K messages the programs stall, as follows:
- the master executes MPI_Finalize but does not complete it
- the slaves apparently hangs during message processing

Running `top' show first two processes running and then only one
remains running, while keeping its core at 100%.

When the "[PRINT]" line is commented out in the do_slave function, the
following is printed:

test_mpi, last compiled: Jun 17 2008 13:30:21
Send/receive 30000 messages...
Delta time: 0
--- FINALIZING rank 0...
[---> program hangs, only one process remains active]

When the "[PRINT]" line is active, the last printed message from
do_slave (in the format iteration/string-length) is around iteration
26000 (but varies between runs), e.g.

[---> more data here ]
26381/12 26382/12 26383/12 26384/12 26385/12 26386/12 26387/12
26388/12 26389/12 26390/12 26391/12 26392/12
[---> program hangs, same behavior as above ]

The problem is apparent when I compile and run with:
$ mpicc test_mpi.c -o test_mpi; mpirun -c 2 ./test_mpi

However, when I optimize the program with -O3, everything works fine:
$ test_mpi, last compiled: Jun 17 2008 13:40:24
Send/receive 3000000 messages...
--- FINALIZING rank 1...
Delta time: 3
--- FINALIZING rank 0...
--- FINALIZED rank 0.
--- FINALIZED rank 1.

What could I do to diagnose this problem? I'm not sure that -O3 would
fix the problem in all cases, as this might be an issue depending on
how long the "processing" of the message takes on the slave.

Thanks in advance,
Ovidiu

------------ CODE FOLLOWS -----------------

#include "mpi.h"

#include <time.h>
#include <stdio.h>
#include <string.h>

#define TOTAL_COUNT 30000

void do_master(int rank);
void do_slave(int rank);

int main(int argc, char**argv)
{
int numtasks, rank;

MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);

if (rank == 0) {
fprintf(stderr, "test_mpi, last compiled: %s %s\n", __DATE__,
__TIME__);
fprintf(stderr, "Send/receive %d messages...\n", TOTAL_COUNT);
do_master(rank);
}
else if (rank == 1) {
do_slave(rank);
}

fprintf(stderr, "--- FINALIZING rank %d...\n", rank);
MPI_Finalize();
fprintf(stderr, "--- FINALIZED rank %d.\n", rank);

return 1;
}

void do_master(int rank)
{
int tag = 1;
char* buffer="ABCxyzABCxyz";
int size = strlen(buffer) + 1;

int count = 0;

time_t t1 = time(0);
while (count++ < TOTAL_COUNT) {
MPI_Send(buffer, size, MPI_CHAR, 1, tag, MPI_COMM_WORLD);
}
time_t t2 = time(0);

printf("Delta time: %d\n", (int)(t2-t1));
}

void do_slave(int rank)
{
MPI_Status Stat;
int tag = 1;
char buffer[255];

int count = 0;

while (count++ < TOTAL_COUNT) {
MPI_Recv(buffer, 13, MPI_CHAR, 0, tag, MPI_COMM_WORLD, &Stat);
int s, i;
for(i=0; i<100; i++) {
s = strlen(buffer);
}
// printf("%d/%d ", count, s); fflush(stdout); // [PRINT]
}
}
Ovidiu Gheorghies
2008-06-17 11:46:52 UTC
Permalink
To support the fact the -O3 does not always help, the following
program works well when TOTAL_COUNT is 3, 30, etc. but *sometimes*
stalls when TOTAL_COUNT is 300K.

I compile and run the program in a loop as follows:
$ for((i=0; i<100; i++)); do echo "************* RUN $i ***********";
mpicc test_mpi.c -o test_mpi -O3; mpirun -c 4 ./test_mpi; done

And the output is:

[---> more runs here, completed successfully]
************* RUN 4 ***********
Numtasks: 4
test_mpi, last compiled: Jun 17 2008 14:42:44
Send/receive 300000 messages...
Numtasks: 4
Numtasks: 4
Numtasks: 4
Slave 1 got 1200000 char, in 100000 msg-s
Delta time: 0
--- FINALIZING rank 0...
--- FINALIZING rank 1...
Slave 3 got 1200000 char, in 100000 msg-s
--- FINALIZING rank 3...
Slave 2 got 1200000 char, in 100000 msg-s
--- FINALIZING rank 2...
--- FINALIZED rank 0.
--- FINALIZED rank 2.
--- FINALIZED rank 1.
--- FINALIZED rank 3.
************* RUN 5 ***********
Numtasks: 4
Numtasks: 4
test_mpi, last compiled: Jun 17 2008 14:42:45
Send/receive 300000 messages...
Numtasks: 4
Numtasks: 4
Slave 3 got 1200000 char, in 100000 msg-s
Delta time: 1
--- FINALIZING rank 0...
--- FINALIZING rank 3...
Slave 1 got 1200000 char, in 100000 msg-s
--- FINALIZING rank 1...
[----> program stalls here ]

The C code of the program is given below:

#include "mpi.h"

#include <time.h>
#include <stdio.h>
#include <string.h>

#define TOTAL_COUNT 300000

void do_master(int rank, int slaves);
void do_slave(int rank, int slaves);

int main(int argc, char**argv)
{
int numtasks, rank;

MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);

printf("Numtasks: %d\n", numtasks);

if (rank == 0) {
fprintf(stderr, "test_mpi, last compiled: %s %s\n", __DATE__,
__TIME__);
fprintf(stderr, "Send/receive %d messages...\n", TOTAL_COUNT);
do_master(rank, numtasks - 1);
}
else if (rank >= 1) {
do_slave(rank, numtasks - 1);
}

fprintf(stderr, "--- FINALIZING rank %d...\n", rank);
MPI_Finalize();
fprintf(stderr, "--- FINALIZED rank %d.\n", rank);

return 1;
}

void do_master(int rank, int slaves)
{
int tag = 1;
char* buffer="ABCxyzABCxyz";
int size = strlen(buffer) + 1;

int count = 0;

time_t t1 = time(0);
while (count < TOTAL_COUNT) {
int slave = 1 + count % slaves;
//printf("%d/%d: Sending to slave %d\n", (int)time(0), count,
slave);
MPI_Send(buffer, size, MPI_CHAR, slave, tag, MPI_COMM_WORLD);

count++;
}
time_t t2 = time(0);

printf("Delta time: %d\n", (int)(t2-t1));
fflush(stdout);
}

void do_slave(int rank, int slaves)
{
MPI_Status Stat;
int tag = 1;
char buffer[255];

int count = 0;

int s = 0, n = 0;
while (count < TOTAL_COUNT/slaves) {
MPI_Recv(buffer, 13, MPI_CHAR, 0, tag, MPI_COMM_WORLD, &Stat);
s += strlen(buffer);
count++;
}
printf("Slave %d got %d char, in %d msg-s\n", rank, s, count);
}

Regards,
Ovidiu
Ovidiu Gheorghies
2008-06-17 12:50:34 UTC
Permalink
Just a quick note: these problems appear when using OpenMPI, but on
MPICH2 the programs appear to work well.
Michael Hofmann
2008-06-19 12:51:13 UTC
Permalink
Post by Ovidiu Gheorghies
Just a quick note: these problems appear when using OpenMPI, but on
MPICH2 the programs appear to work well.
I can confirm this behavior with OpenMPI (1.2.5), it is probably a bug. It seems that sometimes a message gets lost and the according slave waits forever in MPI_Recv.

Using MPI_Ssend (instead of MPI_Send) seems to work, but the explicit handshake can cause some loss of performance.

Additionally, I think it is kind of "ugly" to distribute a huge (?) amount of work using thousands of tiny messages. Maybe you can send bigger chunks at once.


Michael

Georg Bisseling
2008-06-19 12:38:39 UTC
Permalink
On Tue, 17 Jun 2008 12:48:24 +0200, Ovidiu Gheorghies <***@gmail.com> wrote:

Both of your programs may differ in their behavior
regarding compiler optimization.
Post by Ovidiu Gheorghies
int s, i;
for(i=0; i<100; i++) {
s = strlen(buffer);
}
While the loop in the second program sums up s and
s is printed later on. The compiler can not optimize that away.

So the second program will receive messages slower
then the first with -O3.

Still this looks like an OpenMPI bug to me.
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Loading...