Strange behaviour of Isend -- help appreciated

Discussion:

(too old to reply)

m***@gmail.com

2008-06-05 11:44:00 UTC

Hi,

I've a strange bug in one of my MPI programs and reduced it to this
snippet:
--- snip ---
#include <time.h>
#include <math.h>

#include <cstdio>
#include <cstdlib>
#include <iostream>
using namespace std;

#undef SEEK_CUR
#undef SEEK_SET
#undef SEEK_END
#include <mpi.h>
using namespace MPI;

void io(int rank, char *s) {
FILE *f;
char buf[64];
time_t t;

sprintf(buf, "%02d.log", rank);
f = fopen(buf, "a");
t = time(0);

fprintf(f, "%s: %s\n\n\n", ctime(&t), s);
fclose(f);
}

int main(void) {
int value;
int rank;
Request r;

Init();
rank = COMM_WORLD.Get_rank();

if (rank == 0) {
io(rank, "Sending");
value = 42;

r = COMM_WORLD.Isend(&value, 1, INT, 1, 0);

io(rank, "Computing something");
unsigned int tick = 0;
while (++tick < 200000) {
printf("%g\r", sqrt(tick));
}

io(rank, "Finished.");
r.Wait();
} else {
io(rank, "Started");
io(rank, "Waiting for input");

// Waiting for input (blocking)
int tick = 0;
while (true) {
if (COMM_WORLD.Iprobe(ANY_SOURCE, ANY_TAG)) {
COMM_WORLD.Recv(&value, 1, INT, ANY_SOURCE, ANY_TAG);
break;
}
if (++tick % 1000000 == 0)
io(rank, "tick");
}

io(rank, "Received");
}

Finalize();
return EXIT_SUCCESS;
}
--- snap ---

If I understood non-blocking messages correct, process 1 should
receive the message *before* the computation in process 0 starts. But,
as my logs show

----------
(for process 0)
Thu Jun 5 13:31:30 2008
: Sending

Thu Jun 5 13:31:30 2008
: Computing something

Thu Jun 5 13:31:33 2008
: Finished.

---------
(for process 1)
Thu Jun 5 13:31:30 2008
: Started

Thu Jun 5 13:31:30 2008
: Waiting for input

Thu Jun 5 13:31:31 2008
: tick

Thu Jun 5 13:31:31 2008
: tick

Thu Jun 5 13:31:32 2008
: tick

Thu Jun 5 13:31:33 2008
: Received
-----

it (process 1) receives the message of process 0 *after* it has
finished it's calculation. I'm a bit clueless about where I could find
the error, because it looks right ;-)

Any help (hints for documentation, a more correct code, etc...) are
really appreciated,

Regards,
Michael Lesniak

m***@gmail.com

2008-06-05 12:05:08 UTC

Permalink

I forgot:

I've written the receiving in this way to check that process 1 is

Post by m***@gmail.com
int tick = 0;
while (true) {
if (COMM_WORLD.Iprobe(ANY_SOURCE, ANY_TAG)) {
COMM_WORLD.Recv(&value, 1, INT, ANY_SOURCE, ANY_TAG);
break;
}
if (++tick % 1000000 == 0)
io(rank, "tick");
}

by printing the tick-message in the log.

Regards,
Michael

m***@gmail.com

2008-06-05 20:09:15 UTC

Permalink

Another strange observation: if I compile it with logging support
enabled

-mpe=mpilog

it works! The timestamp shows that the message is received at the
right time directly after the send from process 0 and *before* the
computation finishes. Quite strange!

Open and thankful for any ideas...

Regards,
Michael

d***@csclub.uwaterloo.ca.invalid

2008-06-05 21:39:24 UTC

Permalink

Post by m***@gmail.com
Hi,
I've a strange bug in one of my MPI programs and reduced it to this

[process 0 calls Isend and then does stuff, process 1 calls Recv]

Post by m***@gmail.com
If I understood non-blocking messages correct, process 1 should
receive the message *before* the computation in process 0 starts. But,
as my logs show

[...]

Post by m***@gmail.com
it (process 1) receives the message of process 0 *after* it has
finished it's calculation. I'm a bit clueless about where I could find
the error, because it looks right ;-)
Any help (hints for documentation, a more correct code, etc...) are
really appreciated,

Isend returns without blocking and promises that the send will be
completed at some point in the future.
The intention is to let the send run in the background while the
process does something else if it's possible to arrange that, but that
isn't required, and it looks to me like it's not happening here. (With
neither a separate "MPI transport" thread nor hardware assistance, it's
perfectly reasonable for it to not make any progress while the program
isn't actually running MPI library code.)

So my guess is that the send only makes progress while your program is
running MPI library code, so since it doesn't complete before Isend
returns, nothing further happens until the next MPI call you make (the
Wait).

If you can't find a way to persuade your MPI library to work in the
background while your code is running and switching to an
implementation that can do that isn't an option, a possible workaround
is to call Test inside your work loop:
--------
while(more work to do)
{
if("send completed" flag not set && r.Test() says send completed)
{
log "send completed" message
set send complete flag
}

do chunk of work
}

if(send not completed)
{
call r.Wait() to complete send
log "send completed" message
}
--------
I would expect that even if running in the background isn't feasible, a
reasonable quality implementation would attempt to make some progress
(ideally, as much as it can without blocking) on the send operation
when you test it for completion.

(Depending on what you're Really Trying To Do, using a blocking Send
instead of trying to get Isend to play nicely may be the Right Thing to
do here.)

dave
(been reading the MPI spec, now I need to find something to do with it.)

--
Dave Vandervies dj3vande at eskimo dot com
But make the expression only a little more involved ... and I'll turn right
back into a shameless parenthesis-slinger, practically indistinguishable from
a Lisp maniac. --Eric Sosman in comp.lang.c

m***@gmail.com

2008-06-06 06:03:47 UTC

Permalink

Hi Dave,

thanks for your detailed answer.

Post by d***@csclub.uwaterloo.ca.invalid
Isend returns without blocking and promises that the send will be
completed at some point in the future.
The intention is to let the send run in the background while the
process does something else if it's possible to arrange that, but that
isn't required, and it looks to me like it's not happening here. (With
neither a separate "MPI transport" thread nor hardware assistance, it's
perfectly reasonable for it to not make any progress while the program
isn't actually running MPI library code.)

I understand. This explains also while it will deliver the message
correctly, if logging/profiling support is enabled.

Post by d***@csclub.uwaterloo.ca.invalid
So my guess is that the send only makes progress while your program is
running MPI library code, so since it doesn't complete before Isend
returns, nothing further happens until the next MPI call you make (the
Wait).

Correct.

Post by d***@csclub.uwaterloo.ca.invalid
If you can't find a way to persuade your MPI library to work in the
background while your code is running and switching to an
implementation that can do that isn't an option, a possible workaround
[Test loop]

Sounds reasonable. I think I have to write a bit of timing code to see
how (if at all) slower such an extra check would be.

Post by d***@csclub.uwaterloo.ca.invalid
I would expect that even if running in the background isn't feasible, a
reasonable quality implementation would attempt to make some progress
(ideally, as much as it can without blocking) on the send operation
when you test it for completion.

Yes, this was my implicit assumption, too. I'm using MPICH2 and will
now take a more detailed look about sending MPI-messages in the
background.

Thanks again for your kind help,
Michael Lesniak

d***@csclub.uwaterloo.ca.invalid

2008-06-08 22:24:33 UTC

Permalink

Post by m***@gmail.com

Sounds reasonable. I think I have to write a bit of timing code to see
how (if at all) slower such an extra check would be.

Depending what you're Really Trying To Do, it might also make sense to
wait for the send (along with any other communication you have to do -
I assume for a real application it's not just a single send) to
complete before you start on the non-communication work.
i.e. do a Isend or Irecv for every message you need to handle, and then
do a Waitall; then after the Waitall finishes, you know that you don't
need to deal with any more communication and can go ahead and start
your processing without having to worry about checking for completion
inside your processing loop.

dave

--
Dave Vandervies dj3vande at eskimo dot com
IIRC, our sun does not have enough mass to supernova. If you can move your
network beyond the danger zone during the Red Giant phase, you may be able to
pull this off. --David Rubin in comp.lang.c