Saville
2008-03-05 01:02:31 UTC
Hi all,
I have a two node cluster running LINUX.
Each node has a full OS
Both nodes have the same username.
ssh has been set up so that the Master can log into the same username onthe
slave without using a password.
The machine file has been edited to include the slave node (erik).
The /etc/hosts file was updated to include the IP addr/name of the slave.
Under the username in both nodes, there is a subdirectory that contains the
mpich install and I'm trying to run the test program called "cpi".
I have a window open on the slave and I am running "top" on it.
When I run the mpirun command, I see cpi starting on the slave (in top), and
then I get the following error message:
$ ../../bin/mpirun -np 2 cpi
rm_1400: p4_error: rm_start: net_conn_to_listener failed: 33408
p0_12586: p4_error: Child process exited while making connection to remote
process on erik: 0
p0_12586: (10.472656) net_send: could not write to fd=4, errno = 32
Can anyone give me a pointer to some information that would help me figure
out what the problem is?
thanks
I have a two node cluster running LINUX.
Each node has a full OS
Both nodes have the same username.
ssh has been set up so that the Master can log into the same username onthe
slave without using a password.
The machine file has been edited to include the slave node (erik).
The /etc/hosts file was updated to include the IP addr/name of the slave.
Under the username in both nodes, there is a subdirectory that contains the
mpich install and I'm trying to run the test program called "cpi".
I have a window open on the slave and I am running "top" on it.
When I run the mpirun command, I see cpi starting on the slave (in top), and
then I get the following error message:
$ ../../bin/mpirun -np 2 cpi
rm_1400: p4_error: rm_start: net_conn_to_listener failed: 33408
p0_12586: p4_error: Child process exited while making connection to remote
process on erik: 0
p0_12586: (10.472656) net_send: could not write to fd=4, errno = 32
Can anyone give me a pointer to some information that would help me figure
out what the problem is?
thanks