Combining threads and MPI

Discussion:

(too old to reply)

l***@yahoo.com

2008-09-02 10:00:04 UTC

Hi all,
I'm trying to write a program that combines threads and MPI. I'm using
gcc 4.1.2, OpenMP and OpenMPI 1.2.6.

However, when I printed out the value of "provided" in
MPI_Init_thread(&argc, &argv, 5, &provided), I got 0. Does that mean I
can't use threads in my program?

I have tried writing a pure OpenMP program, and it works, so I know my
compiler supports OpenMP. But how do I know that my MPI program is
really running different threads? I noticed that when I remove the
OpenMP #pragmas, the program still works correctly (the #pragmas
probably didn't do anything, since I was not allocated any threads).

I also have a couple of questions about running MPI on multi-core
systems:

Say I have a quad-core computer. If I run a MPI program locally using
"-np 4", will the 4 instances of the program automatically run on each
of the 4 cores, i.e. one instance per core? Or should I have "slots =
4" in the hostfile and run the program with "-np 1"? Or do I not have
any control over whether all 4 cores will be utilized or not?

Also, a question that is somehow related to the above question: if
using a quad-core computer leads to the one-instance-per-core
scenario, will running a MPI program on a quad-core computer be
generally faster than running the same program on 4 single-core
computers, since the physical distance between the cores/memory is
much shorter in the quad-core system than the single-core network,
which might reduce communication costs?

Thank you.

Regards,
Rayne

Georg Bisseling

2008-09-02 14:54:43 UTC

Permalink

Post by l***@yahoo.com
However, when I printed out the value of "provided" in
MPI_Init_thread(&argc, &argv, 5, &provided), I got 0. Does that mean I
can't use threads in my program?

Garbage in, garbage out! I mean, that you have to use the respective
constants like MPI_THREAD_FUNNELED etc. in the call and that you have to compare
the value returned in "provided" against the respective constants.
Do you expect us to look into the OpenMPI 1.2.6 headers to learn what
5 and 0 mean?

BTW. there might be compile time options to configure OpenMPI with
different levels of thread support. Maybe two (or more) versions of
the library are already installed since thread support tends to cost
performance.

--
This signature intentionally left almost blank.
http://www.this-page-intentionally-left-blank.org/

Michael Hofmann

2008-09-02 15:45:26 UTC

Permalink

Post by l***@yahoo.com
I'm trying to write a program that combines threads and MPI. I'm using
gcc 4.1.2, OpenMP and OpenMPI 1.2.6.
However, when I printed out the value of "provided" in
MPI_Init_thread(&argc, &argv, 5, &provided), I got 0. Does that mean I
can't use threads in my program?

In Open MPI, thread support is disabled by default. You have to use
"--enable-mpi-threads" during "./configure" to enable thread support.

Post by l***@yahoo.com
I have tried writing a pure OpenMP program, and it works, so I know my
compiler supports OpenMP. But how do I know that my MPI program is
really running different threads? I noticed that when I remove the
OpenMP #pragmas, the program still works correctly (the #pragmas
probably didn't do anything, since I was not allocated any threads).

"omp_get_num_threads" can be used to determine the number of threads used
in a parallel (OpenMPI) region.

You can have a look at the utilized CPU time/load of your program while it
is running using a suitable process monitoring tool. Linux "top" shows
about 400% CPU usage for a process with 4 _working_ threads on a quad-core
system. However, this indicates only a lower bound on the number of
working threads (e.g. >100% means you have more than 1 working thread,
etc.). It is also possible to have a lot more threads and only 50% CPU
usage (because the threads spend to whole time waiting for something).

Post by l***@yahoo.com
I also have a couple of questions about running MPI on multi-core
Say I have a quad-core computer. If I run a MPI program locally using
"-np 4", will the 4 instances of the program automatically run on each
of the 4 cores, i.e. one instance per core?

"-np 4" creates 4 MPI processes. If you create all 4 processes on one
node, the OS is responsible for distributing the processes to the cores. A
recent multi-core capable OS will run (most likely) each instance on a
separate core.

Post by l***@yahoo.com
Or should I have "slots =
4" in the hostfile and run the program with "-np 1"?

I you want 4 processes, you have to use "-np 4". The "slots" in the
hostfile specifies how many processes are scheduled "at once" to a certain
node (see http://www.open-mpi.org/faq/?category=running#mpirun-scheduling).

Post by l***@yahoo.com
Or do I not have
any control over whether all 4 cores will be utilized or not?

With MPI you have no control over CPU core utilization. You can only
control the (computing) node utilization. CPU core utilization is
controlled by the OS scheduler.

Post by l***@yahoo.com
Also, a question that is somehow related to the above question: if
using a quad-core computer leads to the one-instance-per-core
scenario, will running a MPI program on a quad-core computer be
generally faster than running the same program on 4 single-core
computers, since the physical distance between the cores/memory is
much shorter in the quad-core system than the single-core network,
which might reduce communication costs?

Not generally. The 4 CPUs of the single-core network have separate access
to their local memory, while the 4 cores of one CPU share the access to
the local memory. Especially memory intensive applications or algorithms
may suffer more from the "memory bottleneck" than they profit from the
reduced communication costs.

Going from single-core networks to multi-core CPU reduces a lot of (more
or less) important resources, too (caches, memory access links, network
access links, file I/O access links, ...)

Michael

l***@yahoo.com

2008-09-03 03:54:48 UTC

Permalink

Post by Michael Hofmann

In Open MPI, thread support is disabled by default. You have to use
"--enable-mpi-threads" during "./configure" to enable thread support.

"omp_get_num_threads" can be used to determine the number of threads used
in a parallel (OpenMPI) region.
You can have a look at the utilized CPU time/load of your program while it
is running using a suitable process monitoring tool. Linux "top" shows
about 400% CPU usage for a process with 4 _working_ threads on a quad-core
system. However, this indicates only a lower bound on the number of
working threads (e.g. >100% means you have more than 1 working thread,
etc.). It is also possible to have a lot more threads and only 50% CPU
usage (because the threads spend to whole time waiting for something).

I have the following code to test if I am using threads (using OpenMP)
correctly in OpenMPI:

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <omp.h>

int main(int argc, char **argv)
{
int rank, i, tag1 = 1, tag2 = 2, state, state2 = 5, check, check2 =
5, size, id;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_rank(MPI_COMM_WORLD, &size);
state = rank;
check = rank*2;

if (rank == 0)
{
#pragma omp parallel sections private(id, i)
{
id = omp_get_thread_num();
printf("%d TID: %d\n", rank, id);
#pragma omp section
{
for (i = 1 ; i < size ; i++)
{
MPI_Recv(&state2, 1, MPI_INT, MPI_ANY_SOURCE, tag1,
MPI_COMM_WORLD, &status);
printf("%d received state: %d from %d\n", rank, state2,
status.MPI_SOURCE);
}
}
#pragma omp section
{
for (i = 1 ; i < size ; i++)
MPI_Send(&state, 1, MPI_INT, i, tag1, MPI_COMM_WORLD);
}
#pragma omp section
{
MPI_Recv(&check2, 1, MPI_INT, 1, tag2, MPI_COMM_WORLD,
&status);
}
#pragma omp section
printf("%d get_num_threads: %d\n", rank,
omp_get_num_threads());
}
}
else
{
#pragma omp parallel sections private(id, i)
{
id = omp_get_thread_num();
printf("%d TID: %d\n", rank, id);
#pragma omp section
{
MPI_Send(&state, 1, MPI_INT, 0, tag1, MPI_COMM_WORLD
}
#pragma omp section
{
MPI_Recv(&state2, 1, MPI_INT, 0, tag1, MPI_COMM_WORLD,
&status);
}
#pragma omp section
{
MPI_Send(&check, 1, MPI_INT, 0, tag2, MPI_COMM_WORLD);
}
#pragma omp section
printf("%d get_num_threads: %d\n", rank,
omp_get_num_threads());
}
}
MPI_Finalize();
return 0;
}

Then I compiled the program using "mpicc mpithreadtest.c -fopenmp" and
ran it using 2 nodes.

I got the correct output, i.e. the sending and receiving of the
variables state2 and check2 were correct. However, at this point, I'm
still uncertain if the program ran using threads or simply ignored the
pragmas.

So I replaced all the MPI_Send/MPI_Recv with empty for loops and ran
the program again, using 2 nodes. Using System Monitor, all 4 CPUs
have a usage of at least 40% to 50%. Then I removed all the pragmas
and replaced them with empty for loops, that is, each process now does
not create/run any additional threads. I ran the program using 2 nodes
and this time, System Monitor shows that only 2 CPUs have a usage of
at least 80%. The working CPUs seem to change a lot, i.e. from CPU1 to
CPU4 to CPU3 etc, and occassionally one other CPU has a usage of about
10-15%, but there is always 1 CPU with 0% usage.

Does this mean that my MPI program is using threads correctly? But my
installation of openMPI still does not support threads - running
"ompi_info | grep Thread" gives me "Thread support: posix (mpi: no,
progress: no)". Or does the MPI thread support refer only to posix
threads, but running openMP using MPI is simply letting the OS fork
new threads in each process, without the actual involvement of MPI?

Another question: do I have to uninstall and reinstall OpenMPI to
enable thread support? My ./configure is located in /Desktop/Non-
Shared/Documents/Installations/openmpi-1.2.6, while my OpenMPI
installation (i.e. all the library files, mpiexec etc) is in ~/usr/
lib64/openmpi/1.2.5-gcc. I've tried simply running ./configure in the /
Desktop/Non-Shared/Documents/Installations/openmpi-1.2.6 with the "--
enable-mpi-threads" option and it didn't work. If I do have to
reinstall, how should I do that?

Thank you.

Regards,
Rayne