How to use the MSU High Performance Computing Cluster

0. Assumptions

This guide makes the following assumptions:

You have basic familiarity with how to navigate (cd, ls, pwd), edit files (vim,nano), change permissions (chmod) and execute software (./) within Unix/Linux environments. If you are not familiar with bash scripting, this resource might be helpful.
You understand how to remotely access a remote machine via ssh (Terminal on Mac, or Powershell on Windows), how to perform port forwarding, and how to make and use ssh keys. If you are not familiar with ssh keys, this resource might be helpful.. If you are not familiar with SSH port forwarding, please see this resource.
You have an account with the MSU High Performance Computing Customer; if you do not, please contact me and I will create an account for you.
You have functional familiarity with virtual enrionments; if you do not, this resource might be helpful.

1. SSH into hpcc.msu.edu

To begin, you will access the cluster via ssh

ssh <your-net-id>@hpcc.msu.edu

You will be prompted for a password; this should be the same password you use for other MSU functions.

2. Add your SSH Public key

create (or use an existing) public ssh key and add this ssh key to the set of authorized_keys

vim ~/.ssh/authorized_keys

3. SSH into a development node

Choose a development node from the list here. For instance:

ssh dev-intel18

3. Create a Symbolic Link to the Lab Directory

Your cluster account has two kinds of storage: your personal drive /mnt/home/<your-net-id> (50BG limit), and a shared drive for the lab /mnt/research/ghassemi-lab/ (1TB). We can always request more space on the shared drive if needed, but your personal drive is static. You should create a symbolic link to the lab shared drive from your home directory so it's easy to access

ln -s /mnt/research/ghassemi-lab/ lab-directory

You can then go to the lab directory by cd lab-directory

4. Crete a personal directory

Within the lab directory, create a personal folder to house your research work and change into that directory.

mkdir <your-first-name>-<your-last-name
cd <your-first-name>-<your-last-name>

5. Modules and Python virtual environments

The cluster uses shared machines (gateways and development nodes) that serve as an intemediaries to other nodes on thre cluster. As such, you are not allowed to install packages (e.g. with pip) on the gateway machines directly. You can, however, install any packaes you might need in a virtual environment. To create the virtual environment run the following command:

python3 -m venv venv

6. Bind to the virtual environment.

Once the virtual environement is created, you can now bind to it.

source venv/bin/activate

Once bound, you should see (venv) show up on the left of your command prompt. You can now pip install any packages you might need to run your code. When you are done installing packages, detach from the virtual environment using the deactivate command.

7. Create Data directories and input.

Ultimatly, software takes inputs and generates outputs. Let's generate folders to hold each.

mkdir input
mkdir output

Within the input data directory, let's create a dummy input dataset input.jsonl with the following contents:

{'x':1,'y':2}
{'x':2,'y':4}
{'x':3,'y':1}

8. Create your python script

Next, lets create a simple python script python_script.py that takes as an input argument a given line number from input.jsonl that we want to use to compute the output of a function, and store the results into a results/ directory:

import sys
import json
import subprocess

# Gets a specific line from the input file
def getInput(line_number):
    sed_command = """sed '""" + line_number + """q;d' input/input.jsonl"""
    proc = subprocess.Popen(sed_command ,stdout=subprocess.PIPE, shell=True)
    line = eval(proc.communicate()[0])
    return line

# Computes a function
def func(line):
    result = line['x'] * line['y'] * 2 + 1
    return result

# Saves the results
def save(result,savename):
    with open(savename, 'w') as f:
        json.dump(result, f)

#---------------------------------------
# MAIN
#--------------------------------------
line_number = sys.argv[1]
savedir     = 'output/row_' + line_number + 'output.json'

# Get the input, and apply the function
line      = getInput(line_number)
result = func(line)

# Save the results
save(result, savedir + line_number)

9. Create SLURM Script

The point of a parallel computing custer is that We may want to compute the value of the function func(x) in parallel instead of sequentially. To use the cluster for this, we must specify the parameters of the machines we want to use for the parallel computation. To do this, let's create slurm_job.sb

#!/bin/bash
#SBATCH --job-name=exmaple
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:0
#SBATCH --mem=1G
#SBATCH --time=1:00
#SBATCH --output=python_out/row_%a.out
#SBATCH --error=python_err/row_%a.err
#----------------------------------------------------
source /mnt/research/ghassemi-lab/mohammad-ghassemi/venv/bin/activate
python3 /mnt/research/ghassemi-lab/mohammad-ghassemi/python_script.py $SLURM_ARRAY_TASK_ID

10. Submit your jobs

Now all that's left to do is to execute the python_script.py according to the parameters in slurm_job.sb. We will do this by creating a third script execute.sh:

#!/bin/bash

echo '-----------------------------------------------'
echo ' SUBMITTING JOB TO CLUSTER'
echo '-----------------------------------------------'
mkdir python_out
mkdir python_err
rm -r python_out/*
rm -r python_err/*

####################################################
# GET THE SIZE OF THE INPUT DATA (NUMBER OF LINES)
####################################################
INPUT_SIZE=$(wc -l input/input.jsonl | cut -f1 -d' ') 

####################################################
# SEND A BATCH OF DATA
####################################################
sbatch --array=0-$INPUT_SIZE slurm_job.sb 
watch squeue -u $USER

11. Run your job

./execute.sh

Additional Resources:

The MSU HPCC Wiki (https://wiki.hpcc.msu.edu/) is actually very well put together, and contains example scripts and other resources that suggest how to use the cluster.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

How to use the MSU High Performance Computing Cluster

0. Assumptions

1. SSH into hpcc.msu.edu

2. Add your SSH Public key

3. SSH into a development node

3. Create a Symbolic Link to the Lab Directory

4. Crete a personal directory

5. Modules and Python virtual environments

6. Bind to the virtual environment.

7. Create Data directories and input.

8. Create your python script

9. Create SLURM Script

10. Submit your jobs

11. Run your job

Additional Resources:

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
input		input
output		output
python_err		python_err
python_out		python_out
README.md		README.md
execute.sh		execute.sh
python_script.py		python_script.py
requirements.txt		requirements.txt
slurm_job.sb		slurm_job.sb

HAAIL/slurm-example

Folders and files

Latest commit

History

Repository files navigation

How to use the MSU High Performance Computing Cluster

0. Assumptions

1. SSH into hpcc.msu.edu

2. Add your SSH Public key

3. SSH into a development node

3. Create a Symbolic Link to the Lab Directory

4. Crete a personal directory

5. Modules and Python virtual environments

6. Bind to the virtual environment.

7. Create Data directories and input.

8. Create your python script

9. Create SLURM Script

10. Submit your jobs

11. Run your job

Additional Resources:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages