STEP-BY-STEP INSTRUCTION TO SET UP A ROCKS LINUX CLUSTER with DUAL BOOT (Windows + Rocks) NODES ================================================================================================ We have built two 20-PC cluster using Rocks Linux 5.3 in the School of Physics, Universiti Sains Malaysia (USM), Penang. They are comsics.usm.my and anicca.usm.my. This document serves to record all the steps involved while we build our cluster. First, some useful IP setting information we used when installing the Rocks Linux cluster. USMNET is the network of USM. The IP mentioned below is that defined within USMNET. Primary DNS: 10.202.1.27 Secondary DNS: 10.202.1.6 Gateway: 10.205.19.254 Public Netmask 255.255.255.0 (alternatively 255.255.254.0, or that auto-suggested by Rocks installation DVD.) Private IP: 10.1.1.1 Private netmask: 255.255.0.0. Public netmask: 255.255.255.0 (alternatively 255.255.254.0, or that auto-suggested by Rocks installation DVD.) Note that these private netmask are default values suggested by the Rocks installation disk. IP names and address requirement ================================ You got to have an IP address for the cluster you are setting up, and register the name of the cluster in your institute network. For example, our comsics's IP address is 10.205.19.208, and the formal name of the clsuter, www.comsics.usm.my, is registered in USMNET so that the IP 10.205.19.208 is asssociated with the name www.comsics.usm.my. Then ths comsics cluster can be accessed either via comsics.usm.my or the IP 10.205.19.208. We have to request USM PPKT (the network adminstrator of USM) to have the domain name comsics.usm.my registered under the IP 10.205.19.208. In USM, we request for Server Registration online at http://infodesk.usm.my/infodesk/login.php. You may have to request for a named DNS and IP address from the network administrator of your institute. Installation procedure ======================= 0. Assures that the boot sequence of ALL PCs are set such that CD/DVDROM is in the first boot up sequence. 1. A PC used as a frontend must have two network cards: one built-in and another plug-in with a spec of 10/100/1000 Mb/s. Connect from internet line a LAN wire into the built-in LAN port. This is identified a the eht1 port. The eth0 port (plug-in LAN card) is connected to a switch that must support 10/100/1000 Mb/s. Any other PCs used as compute-nodes must also have LAN cards of 10/100/1000 Mb/s speed (usually plug-in ones). All these plug-in LAN cards must also be connected to the switch. No LAN wire shall be connected from a compute node directly to the internet network. All LAN cable in a compute node must only be connected to the switch. 2. Initially, switch on only the frontend PC. Leave the compute node power off. The LAN switch must be on and assure that the LAN cables are connected into the frontend and the switch in the way mentioned in 2. Slot in the Rocks Cluster DVD installation disk, and type 'build' when prompted by the screen. 3. When prompted, fill in the name, some miscillaneous info of the cluster and the IP details as mentioned above. Choose automatic partition if do not wish to customise the partition. If customised partitioning is desired, suggest to use the following allocation: SWAP: 10GB (make it large so that it can serve as temporary life-saving RAM in case of critical situation during intensive calculation) /var: 20 GB (you may not know when /var is going to blow up, which happen in cases of perpertual warning messages being generated by the apache server or whatever protocal in the server due to don't-now-what error triggered unexpectedly). /boot: 100 MB /: 24 GB (or larger if wished) /state/partition1: Maximum The installation will take place automatically once partitioning begins. 4. Make sure to take out the DVD when it is ejected after about 20 - 30 min when Rocks is first succesfully installed. Failure to retrive the installtion disk from the CD drive renders repeated isntallation of infinte times. 5. Rocks will reboot when finish installing for the the first time. The first screen may see black screen with some warning sign if the frontend PC does not has NVDIA GPU installed. Simply press 'enter' when promted so that the frontend fixes the problem automatically by installing a generic display driver. A GUI shall be displayed after a step times pressing enter. Post-installation of the Rocks Frontend (Stage 1) ================================================= Once the frontend node is up and running, issue the following commands as root in terminal: rocks set appliance attr compute x11 true After the above command, the nodes will be equipped with GUI (Gnome desktop) when they are installed via insert-ethers subsequently (see later). At this stage, it is timely to turn to the instruction in following webpage before proceeding with the post-installation for the frontend. http://www2.fizik.usm.my/configrepo/howto/RocksClusters543/howto_costomise_nodes_installation It provides a detailed instruction of how to configure the frontned so that the rocks cluster can has a dual boot in all its compute nodes (i.e., both Rocks Linux and Windows to sit on the same hardisk of a node). We will download two *.xml files from the www2 server. These *.xml files contain the instruction of how the hardisk of the nodes are going to be partitioned. We will first keep them in /share/apps/configrepo and also in /export/rocks/install/site-profiles/5.3/nodes/. The directory /share/apps/configrepo will be used to keep all the files for post-installation purposes. Essentially, to partition the nodes so that we can later install Windows in the nodes, simply do the following: mkdir /share/apps/configrepo cd /share/apps/configrepo wget http://www2.fizik.usm.my/configrepo/howto/RocksClusters543/replace-partition.xml_noformat_windows wget http://www2.fizik.usm.my/configrepo/howto/RocksClusters543/replace-partition.xml_format_all cp /share/apps/configrepo/replace-partition.xml_format_all /export/rocks/install/site-profiles/5.3/nodes/ cp /share/apps/configrepo/replace-partition.xml_noformat_windows /export/rocks/install/site-profiles/5.3/nodes/ cp /share/apps/configrepo/replace-partition.xml_format_all /export/rocks/install/site-profiles/5.3/nodes/replace-partition.xml cd /export/rocks/install rocks create distro Installing compute nodes ======================== Once 'rocks create distro' is done, you can begin to install the compute nodes. Connect the frontend and all the compute nodes to the LAN switch via their eth0 ports. Of course, assure the power supply to the LAN switch is ON. In the frontend's terminal, issue the command as root in the frontend terminal: insert-ethers When prompted, choose `Compute'. Manually slot a Rocks Cluster installation DVD into the individial PC. They will be detected if the LAN wires are connected properly into the LAN switch via eth0. Warning: insert-ethers ONLY after the command rocks set appliance attr compute x11 true is issed. Boot up the nodes using the Rocks installation disk. These nodes shall be partitioned according to the spec as specified in replace-partition.xml. A node usually takes about 15 - 20 mins (or less) to be configured. YOU MUST RETRIEVE THE INSTALLATION DISK FROM A NODE ONCE IT IS EJECTED TO PREVENT REPEATED INSTALLATION. Once a node completes its installation, it will reboot into a usual-looking CENTOS log-in screen. Post-installation of the Rocks Frontend (Stage 2) ================================================= We will scp all files in a private server directory from www2.fizik.usm.my:/home/tlyoon/repo/configrepo to /share/apps/configrepo residing in the frontend. Among other things, these include mathematica, windowsXP.vdi, various repos and inst* files. scp -r tlyoon@www2.fizik.usm.my:/home/tlyoon/repo/configrepo/* /share/apps/configrepo cd /share/apps/configrepo chmod +x *.sh *.conf *. *.run For ACER Veriton computers in the computer lab, school of physics, USM, we need to first install ati video card, and reboot sh /share/apps/configrepo/inst_ati Populate /etc/yum.repos with repos of CENTOS, epel, rpmforge and adobe cp /share/apps/configrepo/*.repo /etc/yum.repos.d rpm --import http://ftp.riken.jp/Linux/fedora/epel/RPM-GPG-KEY-EPEL wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.2-2.el5.rf.x86_64.rpm rpm --import http://apt.sw.be/RPM-GPG-KEY.dag.txt rpm -K rpmforge-release-0.5.2-2.el5.rf.*.rpm rpm -i rpmforge-release-0.5.2-2.el5.rf.*.rpm Install some necessary application in the frontend yum install -y gftp.x86_64 ### CENTOS-base yum install -y compat-libstdc++-33.x86_64 compat-libstdc++-33.i386 Copy virtualwindows to VBtemplate and cc one to VBlocal. This is quite a lengthy process, may take about 30 mins (or less) mkdir /share/apps/VBtemplate mv /share/apps/configrepo/windowsXP.vdi /share/apps/VBtemplate mkdir /state/partition1/VBlocal cp /share/apps/VBtemplate/windowsXP.vdi /state/partition1/VBlocal chmod 777 /state/partition1/VBlocal/windowsXP.vdi cd /share/apps/configrepo Install mathematica (manual attention required) sh /share/apps/configrepo/mathematica1.conf Install mathematica licence manager (manual attention required. Read the instruction as contained in addmathlm) wget http://www2.fizik.usm.my/configrepo/addmathlm sh addmathlm To activate mathlm, you may need the following URLs: https://user.wolfram.com/portal/requestAK/506f8a2585f11524c0d64de6d0589e4f427ba1af https://user.wolfram.com/portal/passwordrequest.html We add human users to our cluster sh /share/apps/configrepo/useradd.sh sh /share/apps/configrepo/useraddc.sh sh /share/apps/configrepo/useradd_human.sh Installation of physics software in /share/apps in the frontend (in sequence) ============================================================================ intel icc, http://www2.fizik.usm.my/configrepo/howto/intel/inst_icc_11.0.081_sa ifftw, http://www2.fizik.usm.my/configrepo/howto/intel/inst_ifftw_sa intel ifort, http://www2.fizik.usm.my/configrepo/howto/intel/inst_ifort_111072_sa intel impi 4.1.0.024, http://www2.fizik.usm.my/configrepo/howto/intel/inst_impi_410024_sa intel impi 4.1.0.024, (* may not wrok without valid lisence file*) http://www2.fizik.usm.my/configrepo/howto/intel/inst_impi_410024_sa lapack, blas mkdir /share/apps/local/lib cp /usr/lib64/liblapack* /share/apps/local/lib cp /usr/lib64/libblas* /share/apps/local/lib fftw2 http://www2.fizik.usm.my/configrepo/howto/fftw/inst_fftw215_sa fftw3 http://www2.fizik.usm.my/configrepo/howto/fftw/inst_fftw312_sa dftb+ http://www2.fizik.usm.my/configrepo/howto/dftb+/inst_dftb+_sa lammps, gnu version http://www2.fizik.usm.my/configrepo/howto/mylammps/lammps_tmpl_5March_12_gnu_sa/ inst_lammps_tmpl_5March_12_gnu_sa lammps,intel version http://www2.fizik.usm.my/configrepo/howto/mylammps/inst_lammps_tmpl_5March_12_intel_sa wien2k http://www2.fizik.usm.my/configrepo/howto/wien2k/ver_140611/inst_wien2k_140611_parallel_sa gcc http://www2.fizik.usm.my/configrepo/howto/gcc/inst_gcc_sa (note this takes a few hours to complete) pthbhga cd /share/apps scp tlyoon@www2.fizik.usm.my:repo/ptmbhga/ptmbhga.tar.gz tar -zxvf ptmbhga.tar.gz ptmbhga_lammps ptmbhga_dftb+ ptmbhga_g09 deMon2k Post-installation for Rocks compute nodes ========================================= Right after all nodes are up and running, the most important thing to do is issue the following rocks run host commands in frontend. This will replace the rocks.conf file by the one in which the default OS to boot into (by the compute nodes) is 'title Rocks (2.6.18-164.6.1.el5)' instead of 'title Rocks Reinstall'. This will prevent the node to auto-reinstall in case of power failure. Note that the original rocks file (in which default option is set to '0') can be found in /boot/grub of a node. Set the default to '1' instead of '0'. This is essentially what the following rocks run host commands try to do. We are using the rocks.conf file for Rocks Linux 5.3. If you are using a Rocks of different version, try to replace the /boot/grup/rocks.conf file with one modified by you yourself. rocks run host 'cp /boot/grub/rocks.conf /boot/grub/rocks.conf.bk' rocks run host 'cp /share/apps/configrepo/rocks.conf /boot/grub/' Each node in comsics (ACER veriton) must be installed with ATI diplay driver by manually issuing in each node the following command as root (skip this if your nodes do not need the ati display driver; our comsics cluster needs it though) sh /share/apps/configrepo/inst_ati Populate repos for each nodes rocks run host 'cp /share/apps/configrepo/*.repo /etc/yum.repos.d' rocks run host 'rpm --import http://ftp.riken.jp/Linux/fedora/epel/RPM-GPG-KEY-EPEL' rocks run host 'wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.2-2.el5.rf.x86_64.rpm' rocks run host 'rpm --import http://apt.sw.be/RPM-GPG-KEY.dag.txt' rocks run host 'rpm -K rpmforge-release-0.5.2-2.el5.rf.*.rpm' rocks run host 'rpm -i rpmforge-release-0.5.2-2.el5.rf.*.rpm' Install some necessary packages rocks run host 'nohup yum install -y compat-libstdc++-33.x86_64 compat-libstdc++-33.i386 brasero.x86_64 gftp.x86_64 &' rocks run host 'nohup sh /share/apps/configrepo/afternodereinstall &' rocks run host 'mkdir /state/partition1/VBlocal/' The followin line cp *.vdi to local hardisk. This process takes a long time, a few hours at least. rocks run host 'nohup cp /share/apps/VBtemplate/windowsXP.vdi /state/partition1/VBlocal/ &' Installation of mathematica has to be done locally in each node. It can't be done via rocks run host sh /share/apps/configrepo/mathematica1.conf By Yoon Tiem Leong School of Physics Universiti Sains Malaysia (USM) 11800 USM Penang, Malaysia 17 Dec 2013