Software Stack

From ACCRE Wiki

Overview of Software Stack

The software stack is a suite of software ACCRE builds for the users to use. In general the ACCRE has two types of software stack: binary software stack and compiler based software stack. The binary software stack typically includes the stand-alone software which does not need to compile for installation, and the compiler based software stack contains the software that is compiled for installation from a given version of compiler (for example, GCC 10.2). Usually we use the compiler and a year tag to name the compiler based software stack.

Our current active compiler based software stack includes the followings:

  • GCC 2016b (based on GCC/5.4.0-2.26) and Intel 2016b (based on Intel/2016.3.210);
  • GCC 2017b (based on GCC/6.4.0-2.28) and Intel 2017b (based on Intel/2017.4.196);
  • GCC 2019a (based on GCC/8.2.0) and Intel 2019a (based on Intel/2019.1.144);
  • GCC 2020a (based on GCC/10.2.0);
  • GCC 2022a (still in progress, based on GCC/11.3.0)

Usually we upgrade software stack every two years, and since the 2020a software stack we only provide Intel compiler and corresponding libraries for user to use, we do not compile the software based on the Intel compiler and libraries. However if you think you really need it and you have a good reason for the request, please open a helpdesk ticket with us so we can build a specific version for you.

Sometimes to build a software also need additional cluster libraries such as MPI. Therefore, based on the compiler, we grouped some additional cluster libraries to form a general compiler tool set for the user to use. These additional cluster libraries include the MPI library (openMPI for GCC and IntelMPI for Intel compiler), blas and lapack libraries (openBLAS and ScaLAPACK for GCC and MKL for Intel), and other math libraries such as fftw. For the compiler tool set which are based on GCC compilers they are called foss tool set. The available foss tool set include the followings:

  • foss 2016b;
  • foss 2017b;
  • foss 2019a;
  • foss 2020a;
  • foss 2022a

Before foss 2022a, all of foss toolchain are hidden module so you need to load it with a dot sign:

module load foss/.2020a

For the GPU nodes, we also add in cuda into the foss tool set. Before 2022a for each foss tool set we have the corresponding fosscuda tool set. The fosscuda includes the fundamental compiler and libraries for GPU applications.

Software Stack on Compute Nodes

Compute Nodes Types

Compute nodes on the ACCRE cluster are heterogeneous in terms of CPU architecture (and also RAM and local disk space). Some compute nodes contain processors that are 4-5 years old, while others use processors that are less than a year old. The ACCRE jobs will be assigned to these nodes by slurm in random way. Usually we use teh CPU arch name to label the compute node types.

Currently in ACCRE we have the following types of compute nodes, from oldest to newest:

  1. westmere;
  2. sandy bridge;
  3. haswell;
  4. broadwell;
  5. skylake;
  6. icelake

The above CPU arch are all Intel CPUs. For AMD CPU, currently we have zen structure (those AMD epyc cpus). That's all of the CPU architectures currently used in ACCRE.

The mixture of different CPU arch presents some challenges when it comes to building software, as newer processors can use instructions that are unsupported by older processors. As a result, programs that are built from source code on a newer compute node may not run successfully on a compute node or gateway with an older processor. “Illegal Instruction” error messages are likely to occur in this scenario. This is a typical error when running the local built R libraries.

Software Stack on Compute Nodes

For the compiler based software stack, because it's compiled with CPU arch optimization so the compiled software is closely bounded to the given CPU arch. Therefore for each compute node type we have one set of software stack, and for each CPU arch we choose a node as a builder box. As we install the software we will compile the same software on all of the builder boxes and the final software stack will be loaded according to CPU arch types.

Sometimes if the software is compiled on the older CPU arch, but the compiled binary is used on newer CPUs it may cause error. This is usually happened in R users. For example, sometimes the R user may have the job error like below:

All output looks like this:
*** caught illegal operation ***
address 0x2b02bd5762bc, cause 'illegal operand'

This error indicates the R library is compiled on newer CPU but the library is used on the older CPU. Therefore during the running the CPU can not recognize the newer instruction set hence the program got problem. For this case, the simplest way to resolve the issue is to restrict the jobs only on newer CPUs; for example:

#SBATCH --constraint=haswell|skylake

This means the job will be assigned to the compute node either with haswell or skylake CPUs. This will exclude the jobs running on sandy bridge and westmere CPU in the cluster.

LMOD

For loading the software from the stack ACCRE provides the user of Lmod. Lmod is a tool for managing software modules within a shared high-performance computing environment (e.g. the ACCRE cluster). Lmod is designed to intelligently manage, negotiate, and enforce the complex dependencies between software and libraries in a HPC environment. This should lead to better usability of administrator-installed software modules and prevent conflicts between lower-level libraries that many modules depend on.

For instructions on basic Lmod usage, please start with the excellent and concise User Guide for Lmod.


Searching for available Modules, Normal modules and Hidden Modules

in Lmod you can use module spider command to search the modules. You can provide part of the module name, and the search is case insensitive:



[root@gw344 ~]# moudule spider R

-----------------------------------------------------------------------------------------------------------------------------------------
  R:
-----------------------------------------------------------------------------------------------------------------------------------------
    Description:
      R is a free software environment for statistical computing and graphics.

     Versions:
        R/3.3.3
        R/3.4.3
        R/3.6.0
        R/4.0.5
        R/4.2.1
     Other possible modules matches:
        APR  APR-util  Amber  Archive-Zip  Armadillo  Arrow  Aspera-CLI  BerkeleyGW  Bio-SearchIO-hmmer  BioPerl  Brotli  CNVnator  ...

-----------------------------------------------------------------------------------------------------------------------------------------
  To find other possible module matches execute:

      $ module -r spider '.*R.*'

-----------------------------------------------------------------------------------------------------------------------------------------
  For detailed information about a specific "R" package (including how to load the modules) use the module's full name.
  Note that names that have a trailing (E) are extensions provided by other modules.
  For example:

     $ module spider R/4.2.1
-----------------------------------------------------------------------------------------------------------------------------------------


If you can directly used module spider command to search the module names, these modules are normal modules. You can directly search them, load them to use. However there are some modules we choose to hide them during the installation, because they mostly serves as dependent libraries for the normal modules. For example, XZ library is used by many software such as R:


module --show-hidden spider XZ

-----------------------------------------------------------------------------------------------------------------------------------------
  XZ:
-----------------------------------------------------------------------------------------------------------------------------------------
    Description:
      xz: XZ utilities

     Versions:
        XZ/.5.2.2
        XZ/.5.2.3
        XZ/.5.2.4
        XZ/.5.2.5

For these hidden modules it has . sign in front of it's version number so that indicates this is a hidden module. For searching hidden module, you need to append option --show-hidden in the command like:

module --show-hidden spider

so that be able to search the hidden module; otherwise it will issue an error:


module spider XZ
Lmod has detected the following error:  Unable to find: "XZ".

If you don't understand the warning or error, contact the helpdesk at www.vanderbilt.edu/accre/support/helpdesk 


This does not mean the XZ not exist in software stack, you just need another option for a successful search. This is usually one confusing part about the hidden module.

List Modules, Unload Modules

Use module list command to see what modules are currently loaded in your environment:

# module list
No modules loaded

$ module load GCC
$ module list

Currently Loaded Modules:
  1) GCCcore/.5.4.0   3) GCC/5.4.0-2.26
  2) binutils/.2.26


Use module unload command to remove a module from your environment. For example:

$ module load GCC/5.4.0-2.26 
$ module list

Currently Loaded Modules:
  1) GCCcore/.5.4.0   2) binutils/.2.26   3) GCC/5.4.0-2.26

$ module unload GCC/5.4.0-2.26 
$ module list
No modules loaded

With the exception of compilers, a module’s dependencies will not be removed from your environment after unloading the module via module unload. However, Lmod will remove those dependencies if they cause conflicts in a subsequent module load command.

Use module purge command to remove all modules from your environment. For example:

$ module load GCC/5.4.0-2.26  
$ module load OpenMPI/1.10.3 
$ module list

Currently Loaded Modules:
  1) GCCcore/.5.4.0   3) GCC/5.4.0-2.26    5) hwloc/.1.11.3
  2) binutils/.2.26   4) numactl/.2.0.11   6) OpenMPI/1.10.3

$ module purge
$ module list
No modules loaded

Modules collections

Lmod provides a simple way to store the currently loaded modules and restore them later through named collections. This method is especially useful to ensure that the correct environment is set when using a user build software and represents a better alternative to your shell initialization file (e.g. ~/.bashrc).

To create a new collection, use module save followed by the name you want to assign to the collection as shown in the following example:


$ module load GCC/5.4.0-2.26  
$ module load OpenMPI/1.10.3
$ module load FFTW/3.3.4
$ module load OpenBLAS/0.2.18-LAPACK-3.6.1
$ module load ScaLAPACK/2.0.2-OpenBLAS-0.2.18-LAPACK-3.6.1
$ module list

Currently Loaded Modules:
  1) GCCcore/.5.4.0  (H)   6) OpenMPI/1.10.3
  2) binutils/.2.26  (H)   7) OpenBLAS/0.2.18-LAPACK-3.6.1
  3) GCC/5.4.0-2.26        8) FFTW/3.3.4
  4) numactl/.2.0.11 (H)   9) ScaLAPACK/2.0.2-OpenBLAS-0.2.18-LAPACK-3.6.1
  5) hwloc/.1.11.3   (H)

$ module save testnc

This will create a new file ~/.lmod.d/testnc that contains the list of the loaded modules now associated with the collection.

IMPORTANT: If you have loaded the default module for a software (i.e. without specifying the version), the named collection will contain the default module, not the specific version’s module assigned as default at the time of the creation of the collection. If a new module is set as default, Lmod will show an error when restoring the collection and you will have to re-create it by following the displayed instructions.

To get the list of the saved collections, use the module savelist command:


$ module savelist
Named collection list :
  1) testnc
You can use module describe to get the list of the modules saved in a specific named collection as follows:

$ module describe testnc
Collection "testnc" contains: 
  1) GCCcore/.5.4.0  (H)   6) OpenMPI/1.10.3
  2) binutils/.2.26  (H)   7) OpenBLAS/0.2.18-LAPACK-3.6.1
  3) GCC/5.4.0-2.26        8) FFTW/3.3.4
  4) numactl/.2.0.11 (H)   9) ScaLAPACK/2.0.2-OpenBLAS-0.2.18-LAPACK-3.6.1
  5) hwloc/.1.11.3   (H)

The modules saved in a named collection can be restored by issuing the module restore command:

$ module purge
$ module list
No modules loaded

$ module restore testnc
Restoring modules from user's testnc

$ module list

Currently Loaded Modules:
 1) GCCcore/.5.4.0  (H)   6) OpenMPI/1.10.3
  2) binutils/.2.26  (H)   7) OpenBLAS/0.2.18-LAPACK-3.6.1
  3) GCC/5.4.0-2.26        8) FFTW/3.3.4
  4) numactl/.2.0.11 (H)   9) ScaLAPACK/2.0.2-OpenBLAS-0.2.18-LAPACK-3.6.1
  5) hwloc/.1.11.3   (H)

To delete a named collection simply remove the corresponding file in ~/lmod.d

FAQ

Install Perl Module in Home Dir

1) firstly check the Perl from software stack and load in the Perl version as you want, like below:

ml GCC/11.3.0 Perl/5.34.1

then the cpan is available for use.

2) Lauch cpan. In cpan, let's set up the installation directory for the perl module:


[liuf8@gw344 ~]$ cpan

cpan shell -- CPAN exploration and modules installation (v2.28)
Enter 'h' for help.

cpan[1]> o conf makepl_arg "PREFIX=$HOME/perl5libs" 
    makepl_arg         [PREFIX=$HOME/perl5libs]
Please use 'o conf commit' to make the config permanent!

cpan[3]> o conf commit
commit: wrote '/home/liuf8/.cpan/CPAN/MyConfig.pm'

cpan[4]> 


With

o conf makepl_arg "PREFIX=$HOME/perl5libs

so that to set up the installed directory for local installed Perl modules. Then with

o conf commit 

such setting is writing into your configuration of cpan. Later if you still want to use the same setting you do not need to change it.

3) install the modules:

cpan[5]> install Math::Combinatorics

After the installation it shows:


Installing /home/liuf8/perl5libs/lib/perl5/site_perl/5.26.0/Math/Combinatorics.pm
Installing /home/liuf8/perl5libs/man/man3/Math::Combinatorics.3

So finally the installed modules goes to your $HOME/perl5libs folder.

4) to use these modules, you need to define the env variable $PERL5LIB in your .bashrc. For example:

export PERL5LIB=$HOME/perl5libs:$PERL5LIB

Then you should be able to use the modules in your perl script.