This is a collection of topics I have experienced during the first weeks
and months of testing on CLIC.
It's something like FAQ, but there is no guarantee for correctness
in each point.
If anyone has other or additional experience of interest for other
users - please let me know (m.pester@mathematik.tu-chemnitz.de)
Note:
There may be changes (version numbers, paths, ...) by upgrading the system.
Thus, parts of this page may become obsolete some day.
The author is not responsible for the contents of other pages that
are linked here.
Note: The former command pbs_qsub (from /uni/global/bin) is dying out and was replaced by qsub (from /usr/local/bin).
if ( -d /usr/local/packages/lam-rpi-tcp-6.3.2.ssh/bin ) then set path=( /usr/local/packages/lam-rpi-tcp-6.3.2.ssh/bin $path ) endif
"Submitting" an interactive job by qsub -I ... will give you a shell in your current terminal. If you use xpbs to submit an interactive job, xpbs opens a window (xterm) to execute that shell. The file $PBS_NODEFILE contains the list of hostnames (nodes) assigned to your job. The interactive shell runs on the first of them.
A simple example:
#PBS -l nodes=4,walltime=1:00:00 #PBS -A MyProjectNameand submit this file (but don't forget the flag -I): qsub -I my_i_job
If you are able to redirect all input and output of your program to files,
you should use the real batch mode to run it.
The job definition file has to contain some PBS specific options written
as comment for the shell (#PBS -option)
and all the shell commands to be executed on the first node of your
subcluster.
A simple example:
#PBS -l nodes=16,walltime=1:30:00 #PBS -A MyProjectName #PBS -k oe #PBS -m ae #PBS -q @clic0a1.hrz.tu-chemnitz.de # start my program now cd workdir setenv LAMBHOST ${PBS_NODEFILE}.lam.eth1 lamboot -v set ERR=$? if ( $ERR == 0 ) then mpirun -np 16 ptest wipe -v else echo "lamboot - errcode = $ERR" exit 1 endif
In an "emergency" case (defect switch) we had access to CLIC by a special queue only. In such a case, your batch job should contain, e.g.,
#PBS -q clicDefectQ@clic0a1.hrz.tu-chemnitz.de
and use only hostfiles with appendix eth0 instead of eth1.
For general information concerning the usage of LAM-MPI refer to this (german) document.
On CLIC, there are installed 3 (marginal different) versions of LAM-MPI on CLIC under /usr/local/packages. I decided to use the TCP version since the others seem to have advantages for dual processor boards only.
Notes on LAM MPI 6.5.1: Using very long packages and highly
parallel simultaneous exchange, LAM 6.3.2 managed it to transfer
140 Mbit/s per node, but LAM 6.5.1 did not exceed 128 Mbit/s per node.
Most recent installed version is LAM MPI 6.5.6
MPIHOME=/usr/local/packages/lam-rpi-tcp-6.5.6
Filename | Communication via | to be used for |
${PBS_NODEFILE}.lam.eth0 | service network (eth0) | LAM MPI or PVM |
${PBS_NODEFILE}.lam.eth1 | communication network (eth1) | LAM MPI or PVM |
${PBS_NODEFILE}.mpich.eth0 | service network (eth0) | MPICH |
${PBS_NODEFILE}.mpich.eth1 | communication network (eth1) | MPICH |
Only the master node (where you have a shell to start mpirun) has full access to AFS (a "token"), because LAM-MPI does not export the AFS-token to the other nodes. This node must be the first entry in $LAMBHOST. (otherwise problems with stdin and stdout may appear)
clic_chk -b 'rm -r /tmp/lam-<myuserid>@*'Another mysterious message may be that your executable is not found on one of the nodes. -- Check if there is still running an instance of this program and kill it:
clic_chk -b 'killall <myexecutable>'
MPICH is installed locally on CLIC under /usr/local/packages/mpich-1.2.4.ssh/. Just as described above for LAM MPI you can do the following:
There are different ways to run a program with MPICH:
Here are the details for those who want to know what happens:
Start a daemon before mpirun, e.g.
set PORTNO `id -u`
set LOGFILE=/tmp/p4log.$PBS_JOBID
set MFILE=${PBS_NODEFILE}.mpich.eth1
foreach H ( `cat $MFILE` )
ssh -x $H serv_p4 -o -p $PORTNO
-l $LOGFILE
end
and then run your parallel program
mpirun -np <number_of_nodes> -p4ssport
$PORTNO -machinefile $MFILE <executable>
or, using environment variables,
setenv MPI_USEP4SSPORT yes
setenv MPI_P4SSPORT $PORTNO
mpirun -np <number_of_nodes> -machinefile
$MFILE <executable>
For the usage of -nolocal see
above (1).
MPIHOME=/usr/local/packages/mpich-ch_p4mpd-1.2.4.sshThe daemons for this environment can be initiated calling the script
clic_init_mpichmpdHowever, there are some important hints for mpirun, because the daemons do not know anything about environment variables. Thus, you have to specify the full path of the parallel program to be started, e.g. by
mpirun -np <number_of_nodes> -machinefile $PBS_NODEFILE.mpich.eth1 `pwd`/myexecutableIf the program needs any (non-standard) shared libraries you have to specify the corresponding environment variable explicitly:
mpirun -np <number_of_nodes> -machinefile $PBS_NODEFILE.mpich.eth1 `pwd`/myexecutable \ -MPDENV- LD_LIBRARY_PATH=$LD_LIBRARY_PATH
PVM can be used from /afs/tucz/project/sfb393/pvm3 for the PVM
architecture LINUX. The PVM daemon is started by
$PVM_ROOT/lib/pvm [-n<master_hostname>]
<hostfile>
where <hostfile> may be $PBS_NODEFILE
or ${PBS_NODEFILE}.lam.eth1 as described
above in Remarks
The flag -n<master_hostname> is important for the correct use
of the communication network (eth1). You can get <master_hostname>
as `head -1 ${PBS_NODEFILE}.lam.eth1`.
Typical problems using PVM:
cd $HOME mkdir pvm3 pvm3/bin pvm3/bin/LINUX cd pvm3 ln -s /afs/tucz/project/sfb393/pvm3/lib . ln -s /afs/tucz/project/sfb393/pvm3/include .
chacl -R -dir pvm3 -acl urz:clicnodes rl
ln -s ~/workdir/my_executable ~/pvm3/bin/LINUX/my_executableHowever, don't forget to set the correct access rights (urz:clicnodes rl) for this workdir, too.
conf | to see if the PVM daemon runs on all nodes. |
ps -a | to see how many processes are running on which node. |
version | to display the current PVM version (3.4.4) of the daemon which must correspond to the library version your program was linked with. |
reset | to kill hanging processes. |
For simplification there are a few shell scripts to initialize a subcluster
either for LAM-MPI or for MPICH (and for PVM, too):
clic_init_lam
[ < input_file ]
clic_init_mpich [ <
input_file ]
clic_init_mpichmpd [ <
input_file ]
clic_init_pvm [-x] [ < input_file
]
By default they will select the corresponding machinefile (using the
communication network), start the corresponding daemon and then (in case
of success) run a shell interactively. Before entering the subshell
the scripts will add the bin directory
of the appropiate MPI version to the top of your search path (if you didn't
it yet).
The user may specify the argument "eth0"
if he explicitly demands the service network instead of the communication
network.
If you leave the shell (exit)
the daemons are killed and temporary files are deleted.
For simplicity, each of the scripts defines an environment variable
CLIC_NUMNODES with the number of
nodes defined in $PBS_NODEFILE.
This variable is available for the subshell.
For usage in batch mode you may redirect the input from a file which
contains the mpirun command and data
for your program.
Of course, you may also write it in your batch job, e.g.
clic_init_lam <<EOF
mpirun -np 16 <myexecutable>
EOF
Another helpful script may be the following which executes a specified
command via ssh on each of the nodes listed in $PBS_NODEFILE
clic_chk [-b] [command]
the flag "-b" means to execute the ssh
commands in the background instead of one by one. If no command is specified,
clic_chk
will only echo an "OK" from each node (to check if ssh works).
As a special case of clic_chk
you may run the script
clic_chk_load
which extracts those nodes from $PBS_NODEFILE with a load average
of more than 0.10. This will take a while for large number of nodes,
the program pload (see below) may be better for a quick test.
If you want to check the connection via another
machine file than $PBS_NODEFILE, please use
chkhosts [-b] machine_file [command]
instead.
The script clic_init_pvm
is similar to those for MPI.
The flag -x
is for interactive use only, since it will open an additional xterm running the PVM console.
Hence, you need a working DISPLAY - xhost [+] connection.
The current state of CLIC may be displayed by
clic_show
The output of this script looks like this (you may also see the
current state here) :
Server Max Tot Que Run Hld Wat Trn Ext Status
---------------- --- --- --- --- --- --- --- --- ----------
clic0a1.hrz.tu-c 0 41 24 17 0 0 0 0 Scheduling
clic0a1.hrz.tu-chemnitz.de:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
1075.clic0a1.hr fci clicNode thin.job 14190 16 -- -- 2000: R 189:1
1269.clic0a1.hr frank clicNode pbs_clc.sh 11888 1 -- -- 150:0 R 96:23
1270.clic0a1.hr frank clicNode pbs_clc.sh 10806 1 -- -- 150:0 R 96:15
1272.clic0a1.hr klpa clicNode STDIN 13630 111 -- -- 250:0 R 94:20
1309.clic0a1.hr tnc clicNode inter.sh 27726 1 -- 512b 250:0 R 87:35
1310.clic0a1.hr tnc clicNode inter.sh 21408 1 -- 512b 250:0 R 87:20
1333.clic0a1.hr frank clicNode pbs_clc.sh 1314 1 -- -- 150:0 R 22:40
1340.clic0a1.hr frank clicNode pbs_clc.sh 10339 1 -- -- 150:0 R 21:19
1346.clic0a1.hr ikondov clicNode set1a_2-d1 15939 48 -- -- 25:00 R 17:00
1350.clic0a1.hr mibe clicNode bdmpitest -- 238 -- -- 08:00 Q --
1351.clic0a1.hr klpa clicNode STDIN 12615 44 -- -- 250:0 R 14:39
1352.clic0a1.hr ikondov clicNode set1a_34-d 32096 48 -- -- 25:00 R 10:23
1353.clic0a1.hr ikondov clicNode set1a_34-d 20435 48 -- -- 25:00 R 10:23
1355.clic0a1.hr ikondov clicNode set1a_34-d 19408 48 -- -- 25:00 R 10:24
1356.clic0a1.hr ikondov clicNode set1a_34-d 10725 48 -- -- 25:00 R 10:22
1357.clic0a1.hr ikondov clicNode set1a_34-d 9506 48 -- -- 25:00 R 10:23
1358.clic0a1.hr ikondov clicNode set1a_34-d 19348 48 -- -- 25:00 R 08:03
1359.clic0a1.hr ikondov clicNode set1a_34-d -- 48 -- -- 25:00 Q --
1360.clic0a1.hr ikondov clicNode set1a_34-d -- 48 -- -- 25:00 Q --
1361.clic0a1.hr ikondov clicNode set1a_34-d -- 48 -- -- 25:00 Q --
1362.clic0a1.hr ikondov clicNode set1a_34-d -- 48 -- -- 25:00 Q --
1363.clic0a1.hr ikondov clicNode set1a_34-d -- 48 -- -- 25:00 Q --
..........
1379.clic0a1.hr ikondov clicNode set1a_34-d -- 48 -- -- 25:00 Q --
1380.clic0a1.hr ikondov clicNode set1a_34-d -- 48 -- -- 25:00 Q --
1381.clic0a1.hr ikondov clicNode set1a_34-d -- 48 -- -- 25:00 Q --
1382.clic0a1.hr pester clicNode STDIN -- 4 -- -- 04:00 R 01:07
522 nodes in use,
0 nodes free,
7 nodes offline.
Not a script but a small program may be used to find out if another user
left some of your nodes "unclean":
mpirun -np ... pload.CLIC.lamXXX
(for LAM-MPI, XXX=632 or 656 for the current LAM version)
mpirun -np ... pload.CLIC.mpich
(for MPICH).
This program will run a few seconds and then show a time diagram with one row
per node. Nodes which have much more CPU time than others should be inspected in
order to find hanging processes (please send a message to clicadmin if you found
such processes of other users, or system processes such like klogd).
Where can you find the scripts?
/afs/tucz/project/sfb393/bin/
or
/usr/local/bin/
(some of them modified by Mike Becher; more options and help)
It is very annoying, if you have to wait for a CLIC node assigned by PBS
- if you only want to get your program compiled and linked.
In my tests I found no problems to use locally installed versions of
LAM-MPI, MPICH in order to compile and link the programs on my desktop.
Then the executable runs on CLIC. There is also no problem to have
different Linux distributions (local: S.u.S.E., CLIC: RedHat).
The local installations (not really "local") can be used by anyone
else:
LAM-MPI 6.3.2 | /afs/tucz/project/sfb393/packages/lammpi.CLIC | LAM-MPI 6.5.9 | /afs/tucz/project/sfb393/lammpi |
MPICH 1.1.1 | /afs/tucz/project/sfb393/mpich |
PVM 3.4 | /afs/tucz/project/sfb393/pvm3 |
NOTE for LAM-MPI:
By default mpif77 calls "f77". In our local installation,
however, f77 is not usable, so I modified the script mpif77
to use g77 as default.
You may check the command line by
mpif77 -showme
The libraries we have been developing and using for several years are also
usable for CLIC. The library path is
/afs/tucz/project/sfb393/FEM/libs/$archi
where $archi is an environment
variable defining the architecture and/or the parallel system to use.
Here is an overview for Linux:
archi= | where to use for "make" | Message passing library | Hypercube communication library |
LINUX | any Linux computer (*.mathematik) | MPICH | libMPIcom.a |
PVM | libCubecom.a | ||
LINUX_lam | any Linux computer (*.mathematik) | LAM-MPI | libMPIcubecom.a or
libMPIcom.a |
CLIC | CLIC nodes
(clicxxxx.hrz) |
LAM-MPI | libMPIcubecom.a or
libMPIcom.a |
MPICH | libMPICHcom.a | ||
PVM | libCubecom.a | ||
LinuxPGI | Linux with access to /afs | LAM-MPI | libMPIcubecom.a or
libMPIcom.a |
Intel | Linux with access to /afs Compiler needs a fistful of environment variables |
LAM-MPI | libMPIcubecom.a or
libMPIcom.a |
MPICH | libMPICHcom.a or
libMPICHcubecom.a | ||
What else do you need? |
In each case have a look at the file /afs/tucz/project/sfb393/FEM/libs/$archi/default.mk to verify default paths and variables (possibly to overide in your Makefile) |
Each message passing system has some particular features. I will try to split them
into advantages and disadvantages:
advantages | disadvantages | |
LAM-MPI |
|
|
MPICH |
|
|
PVM |
|
|
libCubecom.a |
|
|
libMPIcom.a or
libMPICHcom.a |
|
|
libMPIcubecom.a |
|
|
[since ∼ March 2006]
Due to software upgrade w.r.t. OpenSSH and X-server some problems have
occured to receive graphical output from a parallel running program on CLIC.
Reason: ssh tunneling for X11 data does not work backward from CLIC and a new default configuration of the Xserver on local machines rejects any connection other than such secure tunnels.
Workaround: You must "forward" the DISPLAY variable
that was obtained by ssh on the compute server where you logged in
from outside.
Assume "remhost" to be the hostname of this compute server,
then the value of $DISPLAY will be something like remhost:xx.0.
You can forward this variable to CLIC as an argument of qsub:
qsub -I ... -v DISPLAY=$DISPLAY
Fakultät für Mathematik, TU Chemnitz webstat , Matthias Pester, 12.12.2000 |