Lars
2009-06-04 01:31:51 UTC
Hi,
I'm trying to solve a problem of passing serializable, arbitrarily
sized objects around using MPI and non-blocking communication. The
problem I'm facing is what to do at the receiving end when expecting
an object of unknown size, but at the same time not block on waiting
for it.
When using blocking message passing, I have simply solved the problem
by first sending a small, fixed size header containing the size of
rest of the data, sent in the following mpi message. When using
non-blocking message passing, this doesn't seem to be such a good
idea, since we cant post the main data transfer until we have received
the message header... It seems to take away most of the advantages on
non-blocking io in the first place.
I've been thinking about solving this using MPI_Probe / MPI_IProbe,
but I'm worried about performance.
Question 1:
Will MPI_Probe or the underlying MPI implementation actually receive
the full message data (assuming reasonably sized message, like less
than 10MB) before MPI_Probe returns? Or will there be a significant
data transfer delay (for large messages) when calling MPI_Recv after a
successful MPI_Probe?
What I want is something like this:
1) post one or several non-blocking, variable sized message receives
2) do other, non-MPI work, while any incoming messages will be fully
received into buffers on the local machine.
3) perform completion of the receives posted in 1). I don't want to
unnecessarily wait here for data transfers that could have taken
place during 2).
Problems:
I can't post non-blocking MPI_Irecv() calls in 1, because I don't know
the sizes of incoming messages.
If I simply do nothing in 1, and call MPI_Probe in 3, I'm worried that
I won't get nice compute/transfer overlap because the messages wont
actually be received locally until I post a Probe or Recv in 3.
Question 2:
How can I achieve the communication sequence described in 1,2,3 above,
with overlapping data transfer and local computation during 2?
Question 3:
A temporary kludge solution to the problem above might be to allocate
a temporary receive buffer of some arbitrary, constant maximum size
BUFSIZE in 1 for each non-blocking receive operation, make sure
messages sent are not larger than BUFSIZE, and post MPI_Irecv(buffer,
BUFSIZE,...) calls in 1. I haven't been able to figure out if it's
actually correct and portable to receive less data than specified in
the count argument to MPI_Irecv.
What if the message sent on the other end is 10 bytes, and
BUFSIZE=count=20. Would that be OK?
If anyone can shed any light on this, I'd be grateful. FYI, we're
using a cluster of 2-8 core x86-64 machines running Linux and
connected using ordinary 1Gbit ethernet.
Best regards,
Lars Andersson
I'm trying to solve a problem of passing serializable, arbitrarily
sized objects around using MPI and non-blocking communication. The
problem I'm facing is what to do at the receiving end when expecting
an object of unknown size, but at the same time not block on waiting
for it.
When using blocking message passing, I have simply solved the problem
by first sending a small, fixed size header containing the size of
rest of the data, sent in the following mpi message. When using
non-blocking message passing, this doesn't seem to be such a good
idea, since we cant post the main data transfer until we have received
the message header... It seems to take away most of the advantages on
non-blocking io in the first place.
I've been thinking about solving this using MPI_Probe / MPI_IProbe,
but I'm worried about performance.
Question 1:
Will MPI_Probe or the underlying MPI implementation actually receive
the full message data (assuming reasonably sized message, like less
than 10MB) before MPI_Probe returns? Or will there be a significant
data transfer delay (for large messages) when calling MPI_Recv after a
successful MPI_Probe?
What I want is something like this:
1) post one or several non-blocking, variable sized message receives
2) do other, non-MPI work, while any incoming messages will be fully
received into buffers on the local machine.
3) perform completion of the receives posted in 1). I don't want to
unnecessarily wait here for data transfers that could have taken
place during 2).
Problems:
I can't post non-blocking MPI_Irecv() calls in 1, because I don't know
the sizes of incoming messages.
If I simply do nothing in 1, and call MPI_Probe in 3, I'm worried that
I won't get nice compute/transfer overlap because the messages wont
actually be received locally until I post a Probe or Recv in 3.
Question 2:
How can I achieve the communication sequence described in 1,2,3 above,
with overlapping data transfer and local computation during 2?
Question 3:
A temporary kludge solution to the problem above might be to allocate
a temporary receive buffer of some arbitrary, constant maximum size
BUFSIZE in 1 for each non-blocking receive operation, make sure
messages sent are not larger than BUFSIZE, and post MPI_Irecv(buffer,
BUFSIZE,...) calls in 1. I haven't been able to figure out if it's
actually correct and portable to receive less data than specified in
the count argument to MPI_Irecv.
What if the message sent on the other end is 10 bytes, and
BUFSIZE=count=20. Would that be OK?
If anyone can shed any light on this, I'd be grateful. FYI, we're
using a cluster of 2-8 core x86-64 machines running Linux and
connected using ordinary 1Gbit ethernet.
Best regards,
Lars Andersson