Batch Clusters¶
High Performance Computing (HPC) involves the execution of a CPU-intensive application with a particular set of input parameters and input data sets. Because of the large CPU requirements, these applications normally use the Message Passing Interface (MPI) to create a high-performance platform from a sizable number of discrete machines.
These types of applications are naturally job or task-based and historically have been run on batch systems such as Slurm, or HTCondor. These systems can be run within cloud infrastructures, although they generally lead to a significant amount of incidental complexity and service management overheads.
An example SlipStream application for a Slurm cluster is provided. This example deploys a fully functioning Slurm cluster with the following characteristics:
- One master node and multiple workers (two by default).
- The
/home
area exported by the master to all of the workers.- A
/scratch
area with local scratch space on all nodes.- A single, default SLURM partition containing all nodes.
- Common software packages (e.g. g++, OpenMPI) are installed.
- The root account on the master can access all workers via SSH.
- Parallel SSH has been installed to facilitate cluster management.
The cluster can be used from the tuser
account and managed through
the root
account on the master node.
Starting the Cluster¶
To deploy the cluster, navigate to the “slurm-cluster” application
within Nuvla and click the Deploy...
action. You can choose which
cloud infrastructure to use, the number of workers, and their size
from the deployment dialog.
You can also deploy this application from the command line using the
SlipStream client. Before using any of the SlipStream commands,
you will need to authenticate with Nuvla using the ss-login
command:
$ ss-login --username=<username> --password=<password>
On success, this will exit with a return code of 0 and store an
authentication token locally for the SlipStream client commands. You
can use the ss-logout
command to delete this authentication token.
You can now start a SLURM cluster with the ss-execute
command:
$ ss-execute \
--parameters="worker:multiplicity=4,
worker:instance.type=Huge" \
--keep-running=on-success \
--wait=20 \
--final-states=Done,Cancelled,Aborted,Ready \
apps/BatchClusters/slurm/slurm-cluster
::: Waiting 20 min for Run https://nuv.la/run/1a90f7df-a8db-4fa8-b2d2-463afa296c5a to reach Done,Cancelled,Aborted,Ready
[2018-19-24-13:19:59 UTC] State: Initializing
[2018-20-24-13:20:20 UTC] State: Initializing
[2018-20-24-13:20:50 UTC] State: Initializing
[2018-21-24-13:21:21 UTC] State: Initializing
[2018-21-24-13:21:51 UTC] State: Provisioning
[2018-22-24-13:22:21 UTC] State: Provisioning
[2018-22-24-13:22:52 UTC] State: Provisioning
[2018-23-24-13:23:22 UTC] State: Provisioning
[2018-23-24-13:23:53 UTC] State: Provisioning
[2018-24-24-13:24:23 UTC] State: Executing
[2018-24-24-13:24:54 UTC] State: Executing
[2018-25-24-13:25:24 UTC] State: Executing
[2018-25-24-13:25:55 UTC] State: Executing
[2018-26-24-13:26:25 UTC] State: Executing
[2018-26-24-13:26:56 UTC] State: Executing
[2018-27-24-13:27:26 UTC] State: Executing
[2018-27-24-13:27:57 UTC] State: Executing
[2018-28-24-13:28:27 UTC] State: Executing
[2018-28-24-13:28:57 UTC] State: Executing
[2018-29-24-13:29:28 UTC] State: Executing
[2018-29-24-13:29:58 UTC] State: SendingReports
[2018-30-24-13:30:29 UTC] State: SendingReports
OK - State: Ready. Run: https://nuv.la/run/1a90f7df-a8db-4fa8-b2d2-463afa296c5a
With the given options, the SLURM cluster will contain 4 workers and 1 master. Each of the workers will be of the “Huge” flavor. The command will wait until the cluster reaches one of the given final states. It will also provide you with the deployment (“run”) identifier (the UUID in the “https://nuv.la/run/…” URL) that can be used to terminate the cluster.
The example shows how to change the number of worker nodes in the cluster with the worker:multiplicity parameter. You can also specify the flavor (instance type) of the machine with the worker:instance.type parameter. Supported values are: Micro, Tiny, Small, Medium, Large, Extra-large, Huge, Mega, Titan, GPU-small, and GPU-large. Access to the GPU and larger machines must be requested through support. You can also specify the disk size with worker:disk and/or master:disk.
Use the --help
option to find out how to set other options for the
ss-execute
command or the SLURM application description for other
parameters.
Accessing the Cluster¶
Once the deployment is in the “Ready” state, you can log into the master node to use the cluster. You can find the IP address for the master node from Nuvla in the deployment details page, or you can get the IP address after the deployment is ready with the command:
$ ss-get --run=1a90f7df-a8db-4fa8-b2d2-463afa296c5a master.1:hostname
159.100.244.254
replacing the run ID with the one for your deployment. The SSH key
from your user profile will have been added to the root
and
tuser
accounts.
Managing the Cluster¶
The SLURM cluster will have been deployed with common software packages and a batch queue ready to run jobs. Nonetheless, you may want to adjust the node or SLURM configurations. You might want to consult the SLURM Documentation or Administrator Quick Start for managing SLURM.
The root
account on the master node can be used to manage the
cluster. To facilitate this, parallel SSH has been installed and the
root account can access all workers via SSH. Two files have been
created in /root
that list all hosts in the cluster
(hosts-cluster
) and all workers (hosts-workers
).
From the root
account on the master, you can, for example, install
the package “bc” all nodes with the command:
$ parallel-ssh --hosts=hosts-cluster apt install -y bc
[1] 13:58:40 [SUCCESS] worker-1
[2] 13:58:40 [SUCCESS] worker-2
[3] 13:58:40 [SUCCESS] master
The command also allows you to see or capture the output from each
command. There is also a parallel-scp
command for distributing
files around the cluster.
Running Jobs¶
Generally, you will want to run your jobs from a non-privileged
account. The account tuser
has been preconfigured for this. You
might want to consult the SLURM Documentation or User Quick Start
for information on using SLURM for running your applications.
The entire /home
area is exported via NFS to all workers.
Consequently, all user accounts have a shared NFS home area across the
cluster. Data and/or executables uploaded to the master node will be
visible across the cluster.
There is also a /scratch
area on each node that provides local
scratch disk space. Keep in that this scratch space:
- Is fully accessible to all users. Create subdirectories with more restricted permissions if you want to isolate your files from other users.
- Resides on the root partition. Be careful not to fill this space completely as it could have negative consequences for the operating system.
Unlike the standard /tmp
directory, the /scratch
area is not
subject to the operating system’s periodic clean up. You must
manually remove files to free disk space.
Stopping the Cluster¶
When your calculations have completed, you can release the resources
assigned to the cluster by either clicking the Terminate
action
from the deployment detail page in the web application or using the
command line:
$ ss-terminate 98f42dca-98e8-4265-875e-90ddf81d6fca
The command line will wait for the full termination of the run.
Warning
All the resources, including local storage, will be released. Be sure to copy your results off the master node to your preferred persistent storage.