Preliminary matters =================== 1. Connect all PCs (master and the nodes) to a switch via LAN cables. 2. The master should have at least two network cards. The built-in one should be used for the local network, while the additional ones (possibly a plug-in, external network card) for connecting to the internet (use ping google.com to check if the internet is connected via this network card). 2. Naming and numbering convention for master and working nodes are assumed to be in the form ### The naming of 'anicca', 'anicca.usm.my' are optional and can be replaced by any of your preferred name 192.168.1.100 c100 master anicca anicca.usm.my 192.168.1.10 c10 192.168.1.21 c21 192.168.1.22 c22 192.168.1.23 c23 192.168.1.24 c24 192.168.1.25 c25 192.168.1.26 c26 192.168.1.27 c27 192.168.1.28 c28 192.168.1.29 c29 ... ... Setting up a master node ======================== 1. Setup and install a version of Linux for a master node 2. The very first thing after installing a fresh Linux in a node is to set its local IP using the command line tool nmtui to 'Edit a connetction'. Identify the local network card from from the 'Ethernet', e.g., 'enp6s0'. The local network card should, by default, be the physical, built-in network card that comes with the motherboard. Use external network card, by default, for global network connection (e.g., ping google.com). Use tab and up/down arrow keys to nevigate to the 'Edit' tab. Edit the item following the example below: IPv4 CONFIGURATION ▒│ │ │ Addresses 192.168.1.100/24_________ ▒│ │ │ ▒│ │ │ Gateway _________________________ ▒│ │ │ DNS servers ▒│ │ │ Search domains ▒│ │ │ ▒│ │ │ Routing (No custom routes) ▒│ │ │ [ ] Never use this network for default route ▒│ │ │ [ ] Ignore automatically obtained routes ▒│ │ │ [ ] Ignore automatically obtained DNS parameters ▒│ │ │ ▒│ │ │ [X] Require IPv4 addressing for this connection Note that for the master node, the local IP is set to 192.168.1.100 in the above example, 3. Generate security keys for root via ssh-keygen -t rsa 4. Edit /etc/hosts cat /etc/hosts 127.0.0.1 localhost localhost.localdomain ::1 localhost localhost.localdomain 192.168.1.100 master anicca anicca.usm.my 192.168.1.10 c10 192.168.1.21 c21 192.168.1.22 c22 192.168.1.23 c23 192.168.1.24 c24 192.168.1.25 c25 192.168.1.26 c26 192.168.1.27 c27 192.168.1.28 c28 192.168.1.29 c29 5. Prepare nfs directories in master node mkdir /share mkdir -p /export/share/c100/disk00 mkdir -p /export/share/c100/disk1 mkdir -p /export/share/c100/disk2 mkdir -p /export/share/c100/disk3 6 Add additional hard disks to master (or nodes) by following the below procedure: a. Tools to check hard disks and partition names ------------------------------------------------ lsblk df -hT ls /dev/mapper ls /dev/disks fdisk -l Use the above commands to check the hard disks and partitions seen by the PC. Identify the partition names, such as /dev/sdc, /dev/mapper/almalinux-home, etc. Use these information to mount the partition to a mount point in the PC. b. Create mount points ---------------------- You must create the mount points in the PC by adding a line in /etc/fstab, such as /dev/mapper/almalinux-export_share_master_disk0 /export/share/c100/disk00 xfs defaults 0 0 or /dev/sda /export/share/c100/disk00 xfs defaults 0 0 If the wrong mount point is added, it may potentially lead to a halt when the PC is rebooted. In such an event, enter the rescue mode command line to undo the change that causes the problem in /etc/fstab. Once the /etc/fstab has been added with the line above, do mount -a to check if the mounting of the partition is OK or otherwsie. c. Format a disk ---------------- In case you are adding a new, unformatted harddisk, mount -a may return a complain of failure. A hard disk can be formatted via command line using the command (assuming the hard disk seen in /dev/ is sdb. You have to use lsblk to indentify the name of the harddisk to format, such as sda, sdb or sdg etc.) mkfs.xfs /dev/sda 7. Edit /etc/fstab to add additional hard disks in master node. As an example, cat /etc/fstab /dev/mapper/almalinux-root / xfs defaults 0 0 UUID=b7cc6afd-c0f9-4be9-b9c3-dcbc95040850 /boot xfs defaults 0 0 /dev/mapper/almalinux-swap none swap defaults 0 0 /dev/mapper/centos1-home /home xfs defaults 0 0 /dev/mapper/almalinux-export_share_master_disk0 /export/share/c100/disk00 xfs defaults 0 0 /dev/mapper/centos00-root /export/share/c100/disk1 xfs defaults 0 0 /dev/mapper/centos1-root /export/share/c100/disk2 xfs defaults 0 0 /dev/mapper/almalinux-home /export/share/c100/disk3 xfs defaults 0 0 mount -a 8. Prepare the following directories and create links to them. mkdir -p /export/share/c100/disk00/apps/configrepo mkdir -p /export/share/c100/disk00/bin mkdir -p /export/share/c100/disk00/tmp mkdir /share ln -s /export/share/c100/disk00/apps /share/apps ln -s /export/share/c100/disk00/bin /share/bin ln -s /export/share/c100/disk00/tmp /share/tmp 9. Edit /etc/exports in master node to export the nfs directories, so that cat /etc/exports returns /home 192.168.1.1/24(rw,no_root_squash) /share 192.168.1.1/24(rw,no_root_squash) /export/share/c100/disk00 192.168.1.1/24(rw,no_root_squash) /export/share/c100/disk1 192.168.1.1/24(rw,no_root_squash) /export/share/c100/disk2 192.168.1.1/24(rw,no_root_squash) /export/share/c100/disk3 192.168.1.1/24(rw,no_root_squash) 10. Activate nfs server in master systemctl start nfs-server rpcbind systemctl enable nfs-server rpcbind exportfs -ra 11. Configure the firewall on the NFS server to allow NFS client to access the NFS share. To do that, run the following commands on the NFS server. firewall-cmd --permanent --add-service mountd firewall-cmd --permanent --add-service rpc-bind firewall-cmd --permanent --add-service nfs firewall-cmd --reload 12. Edit create a file /share/hosts with the following content cat /share/hosts 127.0.0.1 localhost localhost.localdomain ::1 localhost localhost.localdomain 192.168.1.100 c100 master anicca anicca.usm.my 192.168.1.10 c10 192.168.1.21 c21 192.168.1.22 c22 192.168.1.23 c23 192.168.1.24 c24 192.168.1.25 c25 192.168.1.26 c26 192.168.1.27 c27 192.168.1.28 c28 192.168.1.29 c29 Note that the file /share/hosts is located in a nfs-shared location so that other nodes can read it. 13. Link the hosts file to /etc/hosts cp /etc/hosts /etc/hosts.orig ln -s /share/hosts /etc/hosts Note that if the cluster contains more than the number of nodes specified in /share/hosts, you must add additional items in /share/hosts. In the default /share/hosts, only c10, c21, c22, ..., c29 are prepared. To add more node, append, for example, 192.168.1.30 c30 192.168.1.31 c31 ... etc. 14. Copy all the files from https://comsics.usm.my/configrepo/howto/almalinux/share/bin/* to /share/bin/. This is a collection of scripts for the manitanence of the cluster. 15. For the scripts in /share/bin to work correctly, you must make sure the following are executed. dnf install epel-release dnf update dnf install pdsh dnf install pdsh-rcmd-ssh 16. Set hostname (the hostname is set to master by default) hostnamectl set-hostname master hostnamectl Setting up a node (use c21, with IP 192.168.1.21 as example) ============================================================ 0. Setup and install a version of Linux for a node 1. The very first thing after installing a fresh Linux in a node is to set its local IP using the command line tool nmtui to 'Edit a connetction'. Identify the local network card from from the 'Ethernet', e.g., 'enp6s0'. The local network card should, by default, be the physical, built-in network card that comes with the motherboard. Use external network card, by default, for global network connection (e.g., ping google.com). Use tab and up/down arrow keys to nevigate to the 'Edit' tab. Edit the item following the example below: IPv4 CONFIGURATION ▒│ │ │ Addresses 192.168.1.21/24 _________ ▒│ │ │ ▒│ │ │ Gateway _________________________ ▒│ │ │ DNS servers ▒│ │ │ Search domains ▒│ │ │ ▒│ │ │ Routing (No custom routes) ▒│ │ │ [ ] Never use this network for default route ▒│ │ │ [ ] Ignore automatically obtained routes ▒│ │ │ [ ] Ignore automatically obtained DNS parameters ▒│ │ │ ▒│ │ │ [X] Require IPv4 addressing for this connection In the above example, the node c21 is set to IP 192.168.1.21. The '/24' in the line Addresses '192.168.1.21/24' means the last number in the IP ranges from 1 to 255' 2. Generate security keys for root via ssh-keygen -t rsa 3. Prepare the following nfs directories in nodes mkdir -p export/share/c100 mkdir -p /share 4. Edit /etc/fstab to add additional hard disks in /state/partition1/cxx. Suggest to cp /etc/fstab /etc/fstab.bk before modifying /etc/fstab. An example of /etc/fstab of a node looks like the following: /dev/mapper/almalinux-root / xfs defaults 0 0 UUID=d3f5c5d3-aa5d-4d26-98d5-67f42b50c703 /boot xfs defaults 0 0 /dev/mapper/almalinux-swap none swap defaults 0 0 /dev/mapper/vg_jaws-lv_root /state/partition1/c21/disk0 ext4 defaults 0 0 /dev/sdb /state/partition1/c21/disk1 xfs defaults 0 0 For the above mounts in /etc/fstab to be successful, mount points must be created for mounting disk0 and disk1: mkdir -p /state/partition1/c21/disk0 mkdir -p /state/partition1/c21/disk1 5. Edit /etc/fstab to add the following lines (as mount points for nfs directories) in the nodes echo '192.168.1.100:/home /home nfs defaults 0 0' >> /etc/fstab echo '192.168.1.100:/export/share/c100 /export/share/c100 nfs defaults 0 0' >> /etc/fstab echo '192.168.1.100:/share /share nfs defaults 0 0' >> /etc/fstab mount -a cat /etc/fstab /dev/mapper/almalinux-root / xfs defaults 0 0 UUID=d3f5c5d3-aa5d-4d26-98d5-67f42b50c703 /boot xfs defaults 0 0 /dev/mapper/almalinux-swap none swap defaults 0 0 /dev/mapper/vg_jaws-lv_root /state/partition1/c21/disk0 ext4 defaults 0 0 /dev/sdb /state/partition1/c21/disk1 xfs defaults 0 0 192.168.1.100:/home /home nfs defaults 0 0 192.168.1.100:/export/share/c100 /export/share/c100 nfs defaults 0 0 192.168.1.100:/share /share nfs defaults 0 0 6. Create link for /etc/hosts in nodes mv /etc/hosts /etc/hosts.orig ln -s /share/hosts /etc/hosts ls -la /etc/hosts lrwxrwxrwx. 1 root root 27 Aug 15 11:40 /etc/hosts -> /share/hosts 7. Execute the following command so that all users with the homes in /home can use passwordless ssh between the master and the nodes: setsebool -P use_nfs_home_dirs 1 8. Set host name (say in this case, the hostname is set to c21) hostnamectl set-hostname c21 hostnamectl Static hostname: c21 Icon name: computer-desktop Chassis: desktop ... 9. Issue (preferred) /share/bin/pdsh_copy_id as root from master note to perform ssh-copy-id to copy the id of root@master to root/ in all nodes, inlcuding the newly added node. Alternatively (not preferred), issue as root in master node ssh-copy-id c21 10. Once a new node is added, check and assure the presence of /share/apps/configrepo/users_data/userpass.latest.dat which is the record of the existing users in the cluster. Execute as root in master node the following line (preferred) /share/bin/pdsh_add_user_list Alternatively (not preferred), execute the following command as root in the new node /share/bin/add_user_list_locally The above will add the all existing users in the cluster into the new node. Check that the addition of existing users to the new node is success or othwerwise by issuing the following line in the new node: cat /etc/passwd 11. Issuing the following line as root from the master node, so that all users added in the previous step into the new node can use passwordless ssh between the master and the new node (preferred): /share/bin/pdsh_setsebool Alternatively (not preferred), execute as root in the new node the following to implement this job across all nodes, setsebool -P use_nfs_home_dirs 1 Maintanence =========== 1. /share/bin/ keeps a collection of scripts for the maintanence of the cluster. 2. To add a new user into the cluster globally, assure existence of /share/apps/configrepo/users_data/userpass.latest.dat /share/tmp/ /share/bin/uid Issue, as root from within the master node /share/bin/pdsh_add_new_user to add a new user into the cluster. 3. The following maintanence scripts in /share/bin/ can be used: pdsh_shutdown_all : shut down all nodes, including the master node remove_user_locally : issuing this script from within a local node as root to remove a specified user in that node. pdsh_remove_user : to remove a specified user from the cluster pdsh_sshreachable : print out the name of the nodes that are currenly up and ssh-able in the cluster pdsh-template : a template for issuing a generic command line across all nodes