This is a guide to using ClusterMQ on Sharcnet via SSH. It's a work in progress — let me know if something is unclear! The goal of this is to get you to the point where you can submit jobs using the Q()
function from a local machine.
What you do here depends on your local machine. A complete guide can be found here.
The ComputeCanada clusters should already have ZeroMQ installed. You should install ClusterMQ in the R module you plan on using by starting a session and running:
install.packages("clustermq")
You need to add the following to your local .Rprofile
file. You can find this file by running file.edit("~/.Rprofile")
in R. If the file doesn't exist, create it in your home directory.
options(
clustermq.scheduler = "ssh",
clustermq.ssh.host = "user@host", # e.g. bob@graham.computecanada.ca
clustermq.ssh.log = "~/cmq_ssh.log" # ssh log location
)
You need to add the following to your cluster .Rprofile
file. You can find this file in your home directory:
options(
clustermq.template = "/path_to_file/slurm.tmpl"
)
You also need to create a .tmpl
file (in the example above, this is slurm.tmpl
) in the same directory. Here's an example, taken from of I use (via mschubert):
#!/bin/sh
# Modified from https://github.com/mschubert/clustermq/blob/master/inst/SLURM.tmpl
# under the Apache 2.0 license.
#SBATCH --job-name=superconservative_halcyon_{{ job_name }}
#SBATCH --output={{ log_file | /dev/null }}
#SBATCH --error={{ log_file | /dev/null }}
#SBATCH --mem-per-cpu={{ memory | 3G }}
#SBATCH --array=1-{{ n_jobs }}
#SBATCH --cpus-per-task={{ cores | 4 }}
#SBATCH --account=ACCOUNT NAME YOU WANT GOES HERE e.g. def-abc <--- change this
#SBATCH --time=0-00:10 <--- change this
module load R # Comment out if R is not an environment module.
# ulimit -v $(( 1024 * {{ memory | 3G }} ))
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'
You need to add the R module you want to use to the end of your bashrc. If you don't, you'll fail with
Error in initialize(...) :
Remote R process did not respond after 5000 seconds. Check your SSH server log.
For example, I use the following:
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=
# User specific aliases and functions
module load r/4.2.2 # <--- this is the line you need to add
The following should work (you'll be prompted for your password or ssh key):
library(clustermq)
fx = function(x) x * 2
Q(fx, x=1:3, n_jobs=1)