Multiple external MPI jobs and collection of output in a serial main program?

Discussion:

(too old to reply)

Nam

2008-06-15 18:50:32 UTC

Hello,

Please help me on this urgent need for my MPI program on a Linux
cluster (4 CPUs per node; 2 dual-cores per node).

I have a main Fortran program for a micro genetic algorithm
optimization. This main program calls one external code (it could be
a commercial or binary code without source code) with multiple data
sets iteratively. I use an MPI version of this program which uses 4
CPUs at one generation, and the external code is called by using "call
system( *** )" in the main Fortran program.

This was alright for a small external code in serial mode, but now I
have to use the external code in MPI mode which may need several
computing nodes for a speed-up. Here I got a problem. I don't know how
to write the main program which calls the external MPI code 4 times at
once at a time step. Of course at the end of each time step or
generation I need to collect the output from these 4 MPI jobs and
evaluate a fitness function.

I am trying to write the main program in serial mode, which calls the
external MPI program 4 times (it may need to be background jobs),
waits until these 4 MPI jobs are finished, and does some math and
moves on to the next time step or generation. This could be done using
a script file but I just don't know how to.

Thanks for your suggestions,
Nam

Terence

2008-06-15 22:45:23 UTC

Permalink

I've no idea if this will help you, but it is a concept I want to
explain.
If you have a way of starting multiple parallel jobs (applications)
that need to inter-react, the best way I know is to use shared direct-
access files.

I did this years ago using Fortran programs in a hotel system, where
every computer in the hotel could update files in common, if the user
had the appropriate assiged access level. In this case all the
application were DOS running menu-selectable daughter fortran-compiled
programs.

I know Linux is very different from Windows or DOS, but if you cannot
do someting one way, there may be another...

j***@gmail.com

2008-06-16 01:23:29 UTC

Permalink

Post by Nam
Hello,
Please help me on this urgent need for my MPI program on a Linux
cluster (4 CPUs per node; 2 dual-cores per node).
I have a main Fortran program for a micro genetic algorithm
optimization. This main program calls one external code (it could be
a commercial or binary code without source code) with multiple data
sets iteratively. I use an MPI version of this program which uses 4
CPUs at one generation, and the external code is called by using "call
system( *** )" in the main Fortran program.
This was alright for a small external code in serial mode, but now I
have to use the external code in MPI mode which may need several
computing nodes for a speed-up. Here I got a problem. I don't know how
to write the main program which calls the external MPI code 4 times at
once at a time step. Of course at the end of each time step or
generation I need to collect the output from these 4 MPI jobs and
evaluate a fitness function.
I am trying to write the main program in serial mode, which calls the
external MPI program 4 times (it may need to be background jobs),
waits until these 4 MPI jobs are finished, and does some math and
moves on to the next time step or generation. This could be done using
a script file but I just don't know how to.

Instead of calling the external program directly, you could call a
script that submits a parallel job. Of course this may mean waiting
when your system is busy. As the other respondent mentioned, I also
frequently use files to communicate between codes that are run more or
less independently.

Actually, I find this sort of thing is generally easier with the whole
procedure wrapped up in a shell script which runs the ensemble of
models, then runs the analysis/optimiser, and repeats...but fortran is
quite possible too. The choice may depend on the queueing system and
resources available.

James

Craig Powers

2008-06-16 18:41:58 UTC

Permalink

Post by Nam
I am trying to write the main program in serial mode, which calls the
external MPI program 4 times (it may need to be background jobs),
waits until these 4 MPI jobs are finished, and does some math and
moves on to the next time step or generation. This could be done using
a script file but I just don't know how to.

Do I read this to mean that you are spawning four different instances of
the child program? If so, then I would think that you could parallelize
your main program and have each instance spawn one child program. Then
each would only have to wait on one child to finish (should be
relatively straightforward) and you could use MPI gathering calls to
pull the results together (may or may not be relatively straightforward
depending on what you're trying to do).

Since you're talking about this in terms of "nodes", I assume it's in a
cluster environment where there's some sort of scheduler keeping track
of node usage (e.g. Sun Grid Engine). In that environment, I would
imagine that it could be problematic to run your main program in serial
mode while at the same time having it be aware of the processor slots
that are reserved for it for parallel execution.