Monday, 23 April 2012

Run Linpack (HPL) on an HPC (beowulf-style) cluster using CentOS 6.2


A few weeks ago I attended a symposium on HPC and Open Source and ever since I've been wanting to set up my own HPC cluster. So I did, here are the instructions to set up an HPC cluster using CentOS 6.2. 

I have set up a two node cluster, but these instructions could be used for any number of nodes. The servers I've used only have a single 74 GB hard drive, a single NIC, 8 GB of RAM and 2 quad core CPUs, so that the cluster has 16 cores and 16 GB of RAM.
  1. Install CentOS using a minimum install to ensure that the smallest amount of packages get installed.
  2. Enable NIC by editing NIC config file (/etc/sysconfig/network-scripts/ifcfg-eth0) (I used the text install and it seems to leave the NIC disabled, but it's quicker to navigate from the ILO interface):
  3. DEVICE="eth0"
    ONBOOT="yes"
    BOOTPROTO=dhcp
  4. Disable and stop the firewall (I'm assuming no internet access for your cluster, of course):
    chkconfig iptables off; service iptables stop
  5. Install ssh clients and man. This installs the ssh client and scp among others things as well as man, which is always handy to have:
    yum -y install openssh-clients man
  6. Modify ssh client configuration to allow seamless addition of hosts to the cluster. Add this line to /etc/ssh/ssh_config (Note that this is a security risk if your cluster has access to the internet):
    StrictHostKeyChecking no
  7. Generate pass-phrase free key. This will make it easier to add hosts to the cluster (just press enter repeatedly after running ssh-keygen):
    ssh-keygen
  8. Install compilers and libraries (Note that development packages were obtained from here and yum was run from the directory containing them):
    yum -y install gcc gcc-c++ atlas blas lapack  mpich2 make mpich2-devel atlas-devel
  9. Add node hostname to /etc/hosts.
  10. Create file /$(HOME)/hosts and add node hostname to it.
This creates a single node and thus it would be a bit of a stretch to call it a cluster, but adding extra nodes is as simple as repeating steps 1-9.  A few extra steps are needed, though, to ensure smooth running:
  1. Add each extra node to the hosts file (/etc/hosts) of all nodes [A DNS server could be set up instead.] and to (/$(HOME)/hosts).
  2. Copy key generated in step 5 to all nodes (If you don't have a head node, i.e. a node that does not do any calculations, remember to add the key to itself too):
    ssh-copy-id hostname
I have not made any comments on networking and this is because the servers that I have been using only have a single NIC as mentioned above. There are gains to be made by forcing as much intra-node communication as possible through the loopback interface, but this requires unique (/etc/hosts) files for each node and my original plan was to set up a 16 node cluster.

SELinux does not seem to have any negative effects, so I have left it on. I plan to test without it to see whether performance is improved.

At this point all that remains is to add some software that can run on the cluster and there is nothing better than HPL or Linpack, which is widely used to measure cluster efficiency (the ratio between theoretical and actual performance). Do the following steps on all nodes:
  1. Download HPL from netlib.org and extract it to your home directory.
  2. Copy Make.Linux_PII_CBLAS file from  $(HOME)/hpl-2.0/setup/ to $(HOME)/hpl-2.0/
  3. Edit Make.Linux_PII_CBLAS file (Changes in Bold. Note that the MPI section is commented out):
  4. # ----------------------------------------------------------------------
    # - HPL Directory Structure / HPL library ------------------------------
    # ----------------------------------------------------------------------
    #
    TOPdir       = $(HOME)/hpl-2.0
    INCdir       = $(TOPdir)/include
    BINdir       = $(TOPdir)/bin/$(ARCH)
    LIBdir       = $(TOPdir)/lib/$(ARCH)
    #
    HPLlib       = $(LIBdir)/libhpl.a
    #
    # ----------------------------------------------------------------------
    # - Message Passing library (MPI) --------------------------------------
    # ----------------------------------------------------------------------
    # MPinc tells the  C  compiler where to find the Message Passing library
    # header files,  MPlib  is defined  to be the name of  the library to be
    # used. The variable MPdir is only used for defining MPinc and MPlib.
    #
    #MPdir        = /usr/lib64/mpich2
    #MPinc        = -I$(MPdir)/include
    #MPlib        = $(MPdir)/lib/libmpich.a
    #
    # ----------------------------------------------------------------------
    # - Linear Algebra library (BLAS or VSIPL) -----------------------------
    # ----------------------------------------------------------------------
    # LAinc tells the  C  compiler where to find the Linear Algebra  library
    # header files,  LAlib  is defined  to be the name of  the library to be
    # used. The variable LAdir is only used for defining LAinc and LAlib.
    #
    LAdir        = /usr/lib64/atlas
    LAinc        =
    LAlib        = $(LAdir)/libcblas.a $(LAdir)/libatlas.a
    # ----------------------------------------------------------------------
    # - Compilers / linkers - Optimization flags ---------------------------
    # ----------------------------------------------------------------------
    #
    CC           = /usr/bin/mpicc
    CCNOOPT      = $(HPL_DEFS)
    CCFLAGS      = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops
    #
    # On some platforms,  it is necessary  to use the Fortran linker to find
    # the Fortran internals used in the BLAS library.
    #
    LINKER       = /usr/bin/mpicc
    LINKFLAGS    = $(CCFLAGS)
    #
    ARCHIVER     = ar
    ARFLAGS      = r
    RANLIB       = echo
    #
    # ----------------------------------------------------------------------
  5. Run make arch=Linux_PII_CBLAS.  
  6. You can now run Linpack (on a single node):
     cd bin/Linux_PII_CBLAS
    mpiexec.hydra -n 4 ./xhpl 
Repeat steps 1- 5 on all nodes and the you can now run Linpack on all nodes like this (from directory $(HOME)/hpl-2.0/Linux_PII_CBLAS/ ):
mpiexec.hydra -f /$(HOME)/hosts -n x ./xhpl 
where x is the number of cores in your cluster.

For results of running Linpack, see my next post here.

9 comments:

  1. On a Centos 6.3 system after completing steps 1-3 from these instructions. I get the following error on step 4.

    "Make.inc: No such file or directory"

    ReplyDelete
  2. I also moved the hpl-20 directory to system root "/" tried to run "make arch=Linux_PII_CBLAS" from /hpl-20make[2]: Entering directory `/home/gvtlinux/hpl-2.0/src/auxil/Linux_PII_CBLAS'
    Makefile:47: Make.inc: No such file or directory
    make[2]: *** No rule to make target `Make.inc'. Stop.



    ReplyDelete
    Replies
    1. I just ran through the instructions again on a single VM and could not reproduce your issue.

      did you check that everything on step 7 (I've modified so it's not version specific anymore) installed?

      Delete
    2. I had the same issue on one of my nodes. I removed the hpl-2.0 folder and tar file then followed the steps again.

      Delete
  3. There are some issues with the libmpich.a, to solve this problems replace:
    #MPlib = $(MPdir)/lib/libmpich.a
    with:
    #MPlib = $(MPdir)/lib/libmpich.a $(MPdir)/lib/libmpl.a

    ReplyDelete
    Replies
    1. Did you mean?

      MPlib = $(MPdir)/lib/libmpich.a $(MPdir)/lib/libmpl.a

      otherwise it's commented out ;)

      Delete
  4. thanks alot for sharing this, it helped me in getting HPL runs very easily with no time and effort.

    ReplyDelete