Discussion:
lamboot problem. Does anyone konw what should be done?
(too old to reply)
pavrus
2006-09-04 06:36:02 UTC
Permalink
lamboot: boot schema file: hostfile
lamboot: opening hostfile hostfile
lamboot: found the following hosts:
lamboot: n0 endimion001
lamboot: n1 endimion002
lamboot: n2 endimion003
lamboot: n3 endimion004
lamboot: resolved hosts:
lamboot: n0 endimion001 --> 172.22.23.111
lamboot: n1 endimion002 --> 172.22.23.112
lamboot: n2 endimion003 --> 172.22.23.113
lamboot: n3 endimion004 --> 172.22.23.114
lamboot: found 4 host node(s)
lamboot: origin node is 0 (endimion001)
lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -I " -H
172.22.23.111 -P 32776 -n 0 -o 0 ""
hboot: process schema = "/home/G3/Para18Oct/lam-6.5.9/etc/lam-conf.lam"
hboot: found /home/G3/Para18Oct/lam-6.5.9/bin/lamd
hboot: performing tkill
hboot: tkill
hboot: booting...
hboot: fork /home/G3/Para18Oct/lam-6.5.9/bin/lamd
hboot: attempting to execute
dli_inet (sfh_sock_open_clt_inet_stm): Invalid argument
[1] 1354 lamd -H 172.22.23.111 -P 32776 -n 0 -o 0 -d
-----------------------------------------------------------------------------
lamboot encountered some error (see above) during the boot process,
and will now attempt to kill all nodes that it was previously able to
boot (if any).

Please wait for LAM to finish; if you interrupt this process, you may
have LAM daemons still running on remote nodes.
-----------------------------------------------------------------------------
wipe ...

LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University

Executing tkill on n0 (endimion001)...
lamboot did NOT complete successfully
Reuti
2006-09-04 23:35:16 UTC
Permalink
Post by pavrus
lamboot: boot schema file: hostfile
lamboot: opening hostfile hostfile
lamboot: n0 endimion001
lamboot: n1 endimion002
lamboot: n2 endimion003
lamboot: n3 endimion004
lamboot: n0 endimion001 --> 172.22.23.111
lamboot: n1 endimion002 --> 172.22.23.112
lamboot: n2 endimion003 --> 172.22.23.113
lamboot: n3 endimion004 --> 172.22.23.114
lamboot: found 4 host node(s)
lamboot: origin node is 0 (endimion001)
lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -I " -H
172.22.23.111 -P 32776 -n 0 -o 0 ""
hboot: process schema = "/home/G3/Para18Oct/lam-6.5.9/etc/lam-conf.lam"
hboot: found /home/G3/Para18Oct/lam-6.5.9/bin/lamd
hboot: performing tkill
hboot: tkill
hboot: booting...
hboot: fork /home/G3/Para18Oct/lam-6.5.9/bin/lamd
hboot: attempting to execute
dli_inet (sfh_sock_open_clt_inet_stm): Invalid argument
[1] 1354 lamd -H 172.22.23.111 -P 32776 -n 0 -o 0 -d
-----------------------------------------------------------------------------
lamboot encountered some error (see above) during the boot process,
and will now attempt to kill all nodes that it was previously able to
boot (if any).
Please wait for LAM to finish; if you interrupt this process, you may
have LAM daemons still running on remote nodes.
-----------------------------------------------------------------------------
wipe ...
LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University
Executing tkill on n0 (endimion001)...
lamboot did NOT complete successfully
The version you tried maybe too old. Can you try with the latest, which
is 7.1.2 with many improvements.

-- Reuti
pavrus
2006-09-05 08:25:39 UTC
Permalink
Post by Reuti
The version you tried maybe too old. Can you try with the latest, which
is 7.1.2 with many improvements.
-- Reuti
I thougth about this. I am preaparing to set the lam-mpi on the
japanese cluster computers with provided Turbolinux (kernel 2.4.20smp,
gcc 3.2.2) . Other things work fine on them and I didn't want to
change the system, but...

Thank You for replay
pavrus
2006-09-05 08:25:49 UTC
Permalink
Post by Reuti
The version you tried maybe too old. Can you try with the latest, which
is 7.1.2 with many improvements.
-- Reuti
I thougth about this. I am preaparing to set the lam-mpi on the
japanese cluster computers with provided Turbolinux (kernel 2.4.20smp,
gcc 3.2.2) . Other things work fine on them and I didn't want to
change the system, but...

Thank You for replay

K.G.
themos
2006-09-05 14:12:13 UTC
Permalink
Post by pavrus
lamboot: boot schema file: hostfile
lamboot: opening hostfile hostfile
lamboot: n0 endimion001
lamboot: n1 endimion002
lamboot: n2 endimion003
lamboot: n3 endimion004
lamboot: n0 endimion001 --> 172.22.23.111
lamboot: n1 endimion002 --> 172.22.23.112
lamboot: n2 endimion003 --> 172.22.23.113
lamboot: n3 endimion004 --> 172.22.23.114
lamboot: found 4 host node(s)
lamboot: origin node is 0 (endimion001)
lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -I " -H
172.22.23.111 -P 32776 -n 0 -o 0 ""
hboot: process schema = "/home/G3/Para18Oct/lam-6.5.9/etc/lam-conf.lam"
hboot: found /home/G3/Para18Oct/lam-6.5.9/bin/lamd
hboot: performing tkill
hboot: tkill
hboot: booting...
hboot: fork /home/G3/Para18Oct/lam-6.5.9/bin/lamd
hboot: attempting to execute
dli_inet (sfh_sock_open_clt_inet_stm): Invalid argument
[1] 1354 lamd -H 172.22.23.111 -P 32776 -n 0 -o 0 -d
-----------------------------------------------------------------------------
lamboot encountered some error (see above) during the boot process,
and will now attempt to kill all nodes that it was previously able to
boot (if any).
Please wait for LAM to finish; if you interrupt this process, you may
have LAM daemons still running on remote nodes.
-----------------------------------------------------------------------------
wipe ...
LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University
Executing tkill on n0 (endimion001)...
lamboot did NOT complete successfully
LAM/MPI has been superseded by OpenMPI, apparently.

http://www.open-mpi.org/

Loading...