Post by kisThe reason I asked this question is that my program performed better
when runnig on 1 node with several processors with memory shared among
the processors and degrade if I distribute it into two nodes, for
instance.
Why is this astonishing? Even with a non-optimised MPI implementation,
you would benefit from running on a single node; processes on different
nodes have to communicate via the internode network which usually has a
much smaller bandwidth and higher latency.
Post by kisJust wondering. I come across, quite often, this terminology: "vector
multiprocessors" in MPI implementation. Does it imply anything about
memory? Is it a cluster with several processors which shared common
memory?
You have to look at two aspects here: are your nodes single-processor or
SMP systems and what kind of CPU is used. In most high-performance
computers today, one builds a cluster of SMP-nodes (usually two to eight
cores per node), i.e. you have a shared-memory programming model inside
the node and a distributed model between them.
Now onto the CPU architecture. A vector architecture has the distinction
of having a rather large number of registers and a high-bandwidth direct
memory access; on the other hand you get cache-based architectures with
a typically smaller register count. The first kind is a vector
architecture because the machine is able to work with a single
instruction on a whole range of registers (a vector ...) at the same
time (look at NECs SX series for example) whereas the "off-the-shelf"
CPUs like AMD or Intel have to work on every item of the data vector in
a more serial fashion. Of course, the distinction is not so clearly cut
nowadays because new processor types like the Itanium have a rather
large register set, too, and SSE is an attempt to provide vectorising
support.
To return to your question: a vector multiprocessor would most probably
mean an SMP node consisting of vector CPUs.
Hope that help,
Sebastian