Discussion:
Running "MPI" program on "Cluster"
(too old to reply)
Meenal Chougule
2013-09-26 06:17:19 UTC
Permalink
Hello everyone,

I have a program having Master and Slave kind of nature. I want to execute those on a cluster.

for cluster there is a master and 2 slaves. cluster master does decomposition of work and slave executes that.

i know IP`s of both slave but i want to know the command by which i can execute the or options in mpirun.


Thanku,
M D C
blmblm@myrealbox.com
2013-09-26 16:45:50 UTC
Permalink
Post by Meenal Chougule
Hello everyone,
I have a program having Master and Slave kind of nature. I want to execute those on a cluster.
for cluster there is a master and 2 slaves. cluster master does decomposition of work and slave executes that.
i know IP`s of both slave but i want to know the command by which i can execute the or options in mpirun.
I'm not sure what "the or options in mpirun" means here (maybe a
typo?), and I'm not sure I understand your situation, but some
comments/questions that might clarify:

Traditionally (MPI 1.x) MPI programs were strictly SPMD ("single
program, multiple data"), and in this model you would have a
single executable, compiled from code that includes processing for
both master and slave, and you would launch three copies of this
executable with "mpirun", and each copy would know (based on the
output of MPI_Comm_rank) whether it should behave as the master or
a slave.

MPI 2.x adds other options -- processes can spawn other processes, and
mpirun can launch more than one executable.

Does your program fit the MPI 1.x model, or does it use some of the
MPI 2.x features?
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
Meenal Chougule
2013-09-28 06:03:27 UTC
Permalink
Post by Meenal Chougule
Hello everyone,
I have a program having Master and Slave kind of nature. I want to execute those on a cluster.
for cluster there is a master and 2 slaves. cluster master does decomposition of work and slave executes that.
i know IP`s of both slave but i want to know the command by which i can execute the or options in mpirun.
Thanku,
M D C
its 3.0
Meenal Chougule
2013-09-28 06:04:20 UTC
Permalink
Post by Meenal Chougule
Hello everyone,
I have a program having Master and Slave kind of nature. I want to execute those on a cluster.
for cluster there is a master and 2 slaves. cluster master does decomposition of work and slave executes that.
i know IP`s of both slave but i want to know the command by which i can execute the or options in mpirun.
Thanku,
M D C
This is all specifications

mpirun --version
HYDRA build details:
Version: 3.0rc1
Release Date: Mon Nov 12 10:31:40 CST 2012
CC: gcc
CXX: c++
F77: gfortran
F90: gfortran
Configure options: '--disable-option-checking' '--prefix=NONE' '--cache-file=/dev/null' '--srcdir=.' 'CC=gcc' 'CFLAGS= -O2' 'LDFLAGS= ' 'LIBS=-lrt -lpthread ' 'CPPFLAGS= -I/home/minal/Desktop/MPI/mpich-3.0rc1/src/mpl/include -I/home/minal/Desktop/MPI/mpich-3.0rc1/src/mpl/include -I/home/minal/Desktop/MPI/mpich-3.0rc1/src/openpa/src -I/home/minal/Desktop/MPI/mpich-3.0rc1/src/openpa/src -I/home/minal/Desktop/MPI/mpich-3.0rc1/src/mpi/romio/include'
Process Manager: pmi
Launchers available: ssh rsh fork slurm ll lsf sge manual persist
Topology libraries available: hwloc
Resource management kernels available: user slurm ll lsf sge pbs
Checkpointing libraries available:
Demux engines available: poll select
Meenal Chougule
2013-09-28 06:06:04 UTC
Permalink
Post by Meenal Chougule
Hello everyone,
I have a program having Master and Slave kind of nature. I want to execute those on a cluster.
for cluster there is a master and 2 slaves. cluster master does decomposition of work and slave executes that.
i know IP`s of both slave but i want to know the command by which i can execute the or options in mpirun.
Thanku,
M D C
The command i used is

mpirun -np ./manager <cnf file as a input> ./worker "no of worker"


The error i got by unusual termination is this,

============================================================
=======================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:1:***@minal] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:883): assert (!closed) failed
[proxy:1:***@minal] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:1:***@minal] main (./pm/pmiserv/pmip.c:210): demux engine error waiting for event
[***@minal] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[***@minal] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[***@minal] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion
[***@minal] main (./ui/mpich/mpiexec.c:325): process manager error waiting for completion
blmblm@myrealbox.com
2013-09-28 17:33:42 UTC
Permalink
Post by Meenal Chougule
Post by Meenal Chougule
Hello everyone,
I have a program having Master and Slave kind of nature. I want to execute those on a cluster.
for cluster there is a master and 2 slaves. cluster master does decomposition of work and slave executes that.
i know IP`s of both slave but i want to know the command by which i can execute the or options in mpirun.
I still don't know what you mean by "the or options" here.
Post by Meenal Chougule
Post by Meenal Chougule
Thanku,
M D C
The command i used is
mpirun -np ./manager <cnf file as a input> ./worker "no of worker"
Is this the actual command? I ask because I thought "-np" needed to
be followed by a number of processes. But I notice from one of your
other responses [*] that you're using MPICH, and my recent experience
has been with OpenMPI, and I suppose the arguments could be different.

But if this actually launches one copy of a "master" program and
two copies of a "slave" program, well, you've already solved the problem
I thought you were having (how to accomplish that).

[*] Is there a reason you didn't put all the information in one reply
rather than spreading it out? Well, probably not that important.
Post by Meenal Chougule
The error i got by unusual termination is this,
============================================================
=======================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Without knowing anything about your code I can't say much! *Maybe*
the mention of "waiting for completion" means that process B was
waiting for process A to send it something, but process A ended
without sending it. But that's at best a guess.

Have you successfully run other MPI programs (even simple "hello world"
ones) on this cluster?
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
blmblm@myrealbox.com
2013-10-01 15:29:04 UTC
Permalink
Post by ***@myrealbox.com
Post by Meenal Chougule
Post by Meenal Chougule
Hello everyone,
[ snip ]
Post by ***@myrealbox.com
Post by Meenal Chougule
Post by Meenal Chougule
i know IP`s of both slave but i want to know the command by which i can execute the or options in mpirun.
I still don't know what you mean by "the or options" here.
[ snip ]
Post by ***@myrealbox.com
Post by Meenal Chougule
The command i used is
mpirun -np ./manager <cnf file as a input> ./worker "no of worker"
Is this the actual command? I ask because I thought "-np" needed to
be followed by a number of processes. But I notice from one of your
other responses [*] that you're using MPICH, and my recent experience
has been with OpenMPI, and I suppose the arguments could be different.
But if this actually launches one copy of a "master" program and
two copies of a "slave" program, well, you've already solved the problem
I thought you were having (how to accomplish that).
[ snip ]
Post by ***@myrealbox.com
Post by Meenal Chougule
The error i got by unusual termination is this,
[ snip ]
Post by ***@myrealbox.com
Without knowing anything about your code I can't say much! *Maybe*
the mention of "waiting for completion" means that process B was
waiting for process A to send it something, but process A ended
without sending it. But that's at best a guess.
Have you successfully run other MPI programs (even simple "hello world"
ones) on this cluster?
I'm mildly curious about why no response to these questions. If you've
sorted out the problem yourself I'd be interested in hearing what it
turned out to be.
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
Meenal Chougule
2013-09-28 06:06:27 UTC
Permalink
Post by Meenal Chougule
Hello everyone,
I have a program having Master and Slave kind of nature. I want to execute those on a cluster.
for cluster there is a master and 2 slaves. cluster master does decomposition of work and slave executes that.
i know IP`s of both slave but i want to know the command by which i can execute the or options in mpirun.
Thanku,
M D C
plz suggest me sumthing
blmblm@myrealbox.com
2013-09-28 17:34:24 UTC
Permalink
Post by Meenal Chougule
Post by Meenal Chougule
Hello everyone,
[ snip ]
Post by Meenal Chougule
plz suggest me sumthing
I guess you don't know yet that Usenet is not a 24/7 helpdesk. Now
you do. Some groups get enough traffic that your odds of getting a
fast response are good. comp.parallel.mpi is not one of those. I'll
help if I can but if you need near-instant replies you should try
elsewhere.
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
Meenal Chougule
2013-10-02 07:45:24 UTC
Permalink
Post by Meenal Chougule
Hello everyone,
I have a program having Master and Slave kind of nature. I want to execute those on a cluster.
for cluster there is a master and 2 slaves. cluster master does decomposition of work and slave executes that.
i know IP`s of both slave but i want to know the command by which i can execute the or options in mpirun.
Thanku,
M D C
Its not that i want quick response. Its just my bad i could not club my queries in a single question.sorry for that.


U r right in front of -np there should be no. of processes, in my case its 1 but i forgot to put that here. I executed this command properly while running it.

While running a program on cluster I came to know that u need to give machinefile as an input to find hosts/slaves in your cluster network. But I could not get how exactly I can implement it.
Keith Thompson
2013-10-02 18:52:36 UTC
Permalink
Meenal Chougule <***@gmail.com> writes:
[...]
Post by Meenal Chougule
Its not that i want quick response. Its just my bad i could not club
my queries in a single question.sorry for that.
U r right in front of -np there should be no. of processes, in my
case its 1 but i forgot to put that here. I executed this command
properly while running it.
While running a program on cluster I came to know that u need to give
machinefile as an input to find hosts/slaves in your cluster
network. But I could not get how exactly I can implement it.
I understand that English likely isn't your first language, but spelling
out words fully will make what you write much more readable, both to
native English speakers and to others.

U should be You, r should be are, i should be I, no. should be number.

I know that abbreviations like that are common in text messages, but
they're not a good idea here.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
blmblm@myrealbox.com
2013-10-03 21:43:24 UTC
Permalink
Post by Keith Thompson
[...]
Post by Meenal Chougule
Its not that i want quick response. Its just my bad i could not club
my queries in a single question.sorry for that.
U r right in front of -np there should be no. of processes, in my
case its 1 but i forgot to put that here. I executed this command
properly while running it.
While running a program on cluster I came to know that u need to give
machinefile as an input to find hosts/slaves in your cluster
network. But I could not get how exactly I can implement it.
I understand that English likely isn't your first language, but spelling
out words fully will make what you write much more readable, both to
native English speakers and to others.
U should be You, r should be are, i should be I, no. should be number.
I know that abbreviations like that are common in text messages, but
they're not a good idea here.
I'll second this recommendation -- I wasn't going to say anything
since I'm already commenting about other not-about-content things --
but I also find the text-message style distracting. (Maybe it's age.
<shrug> )

(But I hope we didn't scare the OP away -- I'm kind of curious
about what the problem is!)
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
Meenal Chougule
2013-10-06 16:10:52 UTC
Permalink
Post by ***@myrealbox.com
[...]
Post by Meenal Chougule
Its not that i want quick response. Its just my bad i could not club
my queries in a single question.sorry for that.
U r right in front of -np there should be no. of processes, in my
case its 1 but i forgot to put that here. I executed this command
properly while running it.
While running a program on cluster I came to know that u need to give
machinefile as an input to find hosts/slaves in your cluster
network. But I could not get how exactly I can implement it.
Sorry for that sir. I completely agree with your opinion.
Thank you.
Post by ***@myrealbox.com
I understand that English likely isn't your first language, but spelling
out words fully will make what you write much more readable, both to
native English speakers and to others.
U should be You, r should be are, i should be I, no. should be number.
I know that abbreviations like that are common in text messages, but
they're not a good idea here.
I'll second this recommendation -- I wasn't going to say anything
since I'm already commenting about other not-about-content things --
but I also find the text-message style distracting. (Maybe it's age.
<shrug> )
(But I hope we didn't scare the OP away -- I'm kind of curious
about what the problem is!)
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
blmblm@myrealbox.com
2013-10-18 18:57:15 UTC
Permalink
[ snip ]
Post by ***@myrealbox.com
(But I hope we didn't scare the OP away -- I'm kind of curious
about what the problem is!)
Since you replied to my message, you must(?) have read the above,
but ....

*Did* you find a solution to your original problem/question? If so,
it's kind of customary to post something about it, for the benefit
of current readers and anyone searching the archives later.
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
Meenal Chougule
2013-10-22 06:48:31 UTC
Permalink
Post by ***@myrealbox.com
[ snip ]
Post by ***@myrealbox.com
(But I hope we didn't scare the OP away -- I'm kind of curious
about what the problem is!)
Since you replied to my message, you must(?) have read the above,
but ....
*Did* you find a solution to your original problem/question? If so,
it's kind of customary to post something about it, for the benefit
of current readers and anyone searching the archives later.
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
Yes, I got my solution Sir. As I am executing my application on cluster I have to pass machinefile which includes slave IP`s or I can write it in mpirun command also.
Thank you for continuous support!!
blmblm@myrealbox.com
2013-10-22 16:53:50 UTC
Permalink
Post by Meenal Chougule
Post by ***@myrealbox.com
[ snip ]
Post by ***@myrealbox.com
(But I hope we didn't scare the OP away -- I'm kind of curious
about what the problem is!)
Since you replied to my message, you must(?) have read the above,
but ....
*Did* you find a solution to your original problem/question? If so,
it's kind of customary to post something about it, for the benefit
of current readers and anyone searching the archives later.
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
Yes, I got my solution Sir. As I am executing my application on cluster I have to pass machinefile which includes slave IP`s or I can write it in mpirun command also.
Thank you for continuous support!!
You are most welcome. I'm glad you got it sorted out.

(Just for the record, I am not a "Sir" but a "Ma'am". Of course
you couldn't know that, but I'm a little militant about the apparent
default assumption that everyone in this field is male and so like
to make the point when the occasion arises. We female CS types
are a minority, but we do exist!)
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
blmblm@myrealbox.com
2013-10-02 19:10:45 UTC
Permalink
Post by Meenal Chougule
Post by Meenal Chougule
Hello everyone,
[ snip ]


Is there some reason you continue to reply to your own posts rather
than to my responses? It seems to me that it would be easier to
have a discussion if you quoted the questions and comments (from
me) you're replying to. I notice that you seem to be posting
using Google's interface. I'm not, so I don't really know what
the options are, but if you can reply to one of your own posts,
it seems like you should be able to reply to someone else's.
Post by Meenal Chougule
Post by Meenal Chougule
Thanku,
M D C
Its not that i want quick response. Its just my bad i could not club my queries in a single question.sorry for that.
But why would you want to ("club your queries" -- I'm guessing you mean
"group them together"? not criticizing your English if it's not your
first language, and probably not that important to be clear about this
anyway)?
Post by Meenal Chougule
U r right in front of -np there should be no. of processes, in my case its 1 but i forgot to put that here. I executed this command properly while running it.
It's often a good idea, if you're asking about a command, to copy and paste
exactly what you enter rather than retyping it, so you don't make this kind
of mistake.
Post by Meenal Chougule
While running a program on cluster I came to know that u need to give machinefile as an input to find hosts/slaves in your cluster network. But I could not get how exactly I can implement it.
Doesn't your installed version of MPICH include documentation, such as
man pages? I can't just quote from the ones on the systems I use because
what we have is OpenMPI, and some of the details of arguments to mpirun
could be different.

Anyway, if you don't have local documentation or can't make sense of it,
here's the documentation page at the official MPICH site:

http://www.mpich.org/documentation/guides/

and in particular

http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4-README.txt

has a simple example of using the -machinefile option to say which
computers you want to run the program on. (Notice also that there
seem to be other options for specifying this.)

I can't quite tell from the documentation whether you first have to
start some kind of background server process with the command "mpd".

Once I again I ask: Have you been able to run any MPI programs on
this system? among other things that would tell you whether you
need that "mpd" command first.
--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
Loading...