Different Features between MVAPICH and OpenMPI?

Discussion:

(too old to reply)

floydfan

2008-06-30 10:35:45 UTC

Hi,

I'm working at KISTI Supercomputing Center, Korea, as a member of
parallel computing support group.

Recently, we made a new supercom system (SUN x6420 * 192 nodes) with
Infiniband network connection among nodes and installed various mpi
libraries : MVAPICH, MVAPICH2 and OpenMPI.

While supporting application scientists, I found that, some codes work
well with OpenMPI, though they don't work with MVAPICH. Particularly,
these codes even work with MVAPICH, when I use several processors in
the same node; but they don't work if more than 2 nodes cooperate.
(i.e., when I use 16 processors in the same node, it works well... If
I use 16 processors in 2 nodes, it pends in the MPI communication
routine...)

My curiosity made me to test very simple code (which consists of
several MPI routines with small data size...) to find whether the
installation procedure was wrong... But, in a simple code, there's no
trouble in inter-node communication. On the other hand, some codes
that spend vast size memory show this kind of situation even though
they transfer only 4 bytes data (one integer).

As an application scientist myself, I can never understand this kind
of situation. So, I'd like to know the reason and when this situation
would happen. Why do some codes pend in MPI routines if I use multiple
nodes? And, is there any remedy? Also, what characteristics of the
code brings about this problem?

Please let me know the reason for this happening.

Thank you in advance.

Jeff

p.s. What is main difference between MVAPICH and OpenMPI?

Matthew Koop

2008-06-30 15:23:43 UTC

Permalink

Jeff,

The situation you describe should not be occuring.

We have a specific MVAPICH/MVAPICH2 support mailing list that will be
helpful for future discussion:

mvapich-***@cse.ohio-state.edu

Perhaps you can send a follow up note there with what options you have
installed with: VAPI, OpenFabrics, etc. It seems like there may be a setup
issue there in your MVAPICH installation.

Matt

Post by floydfan
Hi,
I'm working at KISTI Supercomputing Center, Korea, as a member of
parallel computing support group.
Recently, we made a new supercom system (SUN x6420 * 192 nodes) with
Infiniband network connection among nodes and installed various mpi
libraries : MVAPICH, MVAPICH2 and OpenMPI.
While supporting application scientists, I found that, some codes work
well with OpenMPI, though they don't work with MVAPICH. Particularly,
these codes even work with MVAPICH, when I use several processors in
the same node; but they don't work if more than 2 nodes cooperate.
(i.e., when I use 16 processors in the same node, it works well... If
I use 16 processors in 2 nodes, it pends in the MPI communication
routine...)
My curiosity made me to test very simple code (which consists of
several MPI routines with small data size...) to find whether the
installation procedure was wrong... But, in a simple code, there's no
trouble in inter-node communication. On the other hand, some codes
that spend vast size memory show this kind of situation even though
they transfer only 4 bytes data (one integer).
As an application scientist myself, I can never understand this kind
of situation. So, I'd like to know the reason and when this situation
would happen. Why do some codes pend in MPI routines if I use multiple
nodes? And, is there any remedy? Also, what characteristics of the
code brings about this problem?
Please let me know the reason for this happening.
Thank you in advance.
Jeff
p.s. What is main difference between MVAPICH and OpenMPI?

Georg Bisseling

2008-06-30 19:30:38 UTC

Permalink

Am 30.06.2008, 12:35 Uhr, schrieb floydfan <***@snu.ac.kr>:

Assuming that SUN x6450 is x86 based and runs Linux you may give
Intel MPI a try. Should give superior performance over Infiniband.

If you are running Solaris you may give Sun's own MPI a chance.

--
This signature was left intentionally almost blank.
http://www.this-page-intentionally-left-blank.org/

Greg Lindahl

2008-06-30 21:07:54 UTC

Permalink

Post by floydfan
p.s. What is main difference between MVAPICH and OpenMPI?

They are completely different implementations of the same MPI
standard.

-- greg

floydfan

2008-07-01 02:41:23 UTC

Permalink

Well... it might be a sensitive issue...

I sent a mail to system admins of my system, asking the installation
options they used. I hope this situation comes from false installation
and compilation options... (Let me send a mail to MVAPICH support
mailing list when I get the reply from system admins...)

As for the OS, we use CentOS 4.6, not Solaris... Hardware is Sun Blade
6048 with Barcelona processors...

Anyway, thanks all.