CIMENT is a high-performance computational platform accessible to all staff members of the Univ. Grenoble Alpes institutes (and their students / invitees). CIMENT is part of GRICAD, the Grenoble Alps Research Infrastructure for Intensive Calculation And Data.
CIMENT consists of multiple platforms (also called clusters):
dahu
(e.g. runtime > 48 hours). It consists of 63 heterogeneous nodes,
most of which are reserved for specific projects. Runtime is limited to
240 hours per job. Two nodes are reserved for the epimed project:
dahu
cluster, each with 32 cores, 4 GPUs, 192 GB RAM, and a
200 GB system SSD. Runtime is limited to 48 hours per job.luke
nodes that are reserved for a
specific project.Note that memory (RAM) is split equally between all cores on a node.
For example on a 32-core dahu
node each core has 6 GB of
RAM. The node with the most RAM is luke41
(384 GB; 16 GB /
core).
All nodes run on Debian GNU/Linux.
This guide focuses on dahu
but the principles apply for
the other clusters. You should use dahu
unless you know
another platform is more appropriate for your work.
The GRICAD documentation:
You can contact the GRICAD helpdesk at sos-gricad@univ-grenoble-alpes.fr or create a ticket via the web interface.
For a brief introduction to Unix, see this software carpentry course or this GRICAD course (in French).
GRICAD also offers doctoral-level courses (in French) on Linux, scientific programming, and parallel computation.
Florent Chuffart is the CIMENT users committee representative for the IAB.
To use CIMENT, you first need to create an account on PERSEUS:
All computation on CIMENT must be allocated to a project. This allows fair sharing of the resources according to each project’s past usage of and contribution to the CIMENT infrastructure. To join a project:
My projects
Join a project
pr-test: "Test project with 3000 hours quota"
(allows 3
months of access and up to 3000 hours of usage)pr-epimed: "EPIgenetique MEDicale"
(the EPIMED
project)If you are a permanent staff member, you may create a new project by contacting the GRICAD staff. Projects must be renewed each year by preparing a short activity report.
The CIMENT platforms are only accessible by SSH via a gateway server (bastion). For more details, see the GRICAD documentation. We will configure your SSH client to transparently connect via the gateway using an SSH key pair. This avoids having to repeatedly type your password.
Windows users must obtain an SSH client. If you are familiar with git bash, cygwin, or Windows Subsystem for Linux you may follow the instructions for Mac / Linux users to configure OpenSSH via the terminal. Otherwise, follow these instructions to configure PuTTY.
Install PuTTY (an an open-source SSH client for Windows). Use the default installation settings.
Open PuTTYgen (in same folder as PuTTY) and click
Generate
.
Enter a password in the Key passphrase
and
Confirm passphrase
boxes. You will need this password to
unlock the key. You may also choose a Key comment
to help
you identify the key (e.g. username@computer-name
).
Save the private key as a file named id_rsa.ppk
in a
folder named .ssh
within your home directory
(i.e. %USERPROFILE%\.ssh\id_rsa.ppk
).
Leave the PuTTYgen window open; we will return to it later to copy the public key.
Launch Pageant (also in the same folder as PuTTY). An icon will
appear in your system tray: .
Right-click the icon and choose Add Key
. Select the private
key file you saved (id_rsa.ppk
), enter the key password,
and click OK
. Your private key is now loaded and available
for PuTTY to use.
Open PuTTY and configure the Session
options to connect
to the round-robin gateway:
access-gricad.univ-grenoble-alpes.fr
22
SSH
Configure the Connection
- Data
options:
your-agalan-username
Configure the Connection
- SSH
-
Auth
options:
[checked]
[path to private key file]
Save the configuration:
Session
optionsaccess-gricad
in the Saved Sessions box and click
Save
dahu
Finally, we will configure a connection to the dahu
platform via the round-robin gateway.
Open PuTTY and load the saved access-gricad
connection.
In the Session
options, change the Host Name to
dahu
. Then configure the Connection
-
Proxy
options to route the connection to dahu
through the gateway server:
Local
access-gricad.univ-grenoble-alpes.fr
22
your-ciment-username
plink.exe %user@%proxyhost -nc %host:%port
Yes
Return to the Session
options and save the configuration
as dahu
Click Open
to connect to the dahu
cluster.
You should not be prompted for your password.
To connect to another platform (e.g. luke
), simply
change dahu
to the name of the platform in the PuTTY
Session
configuration. You will need to authorize your SSH
key separately on each platform (if you followed these instructions you
are already authorized on dahu
and luke
.
Mac / Linux users should use their preferred terminal to configure OpenSSH as follows.
[If you already have a private key
(e.g. ~/.ssh/id_ed25519
or ~/.ssh/id_rsa
) you
may skip this step.]
Open your terminal and type (replace username
and
computer-name
with your username and local computer name;
these are to help you identify the key)
ssh-keygen -t ed25519 -C "username@computer-name"
Follow the prompts to save as ~/.ssh/id_ed25519
and
create a password.
ssh-add ~/.ssh/id_ed25519
Enter your password when prompted.
If you get an error
Could not open a connection to your authentication agent.
then the SSH agent is not running. Start it with ssh-agent
.
You may also wish to add the following to your
~/.bash_profile
to automatically start the agent and add
your keys when you open a terminal window:
# Start ssh-agent and add keys on login
env=~/.ssh/agent.env
agent_load_env () { test -f "$env" && . "$env" >| /dev/null ; }
agent_start () {
(umask 077; ssh-agent >| "$env")
. "$env" >| /dev/null ; }
agent_load_env
# agent_run_state: 0=agent running w/ key; 1=agent w/o key; 2= agent not running
agent_run_state=$(ssh-add -l >| /dev/null 2>&1; echo $?)
if [ ! "$SSH_AUTH_SOCK" ] || [ $agent_run_state = 2 ]; then
agent_start
ssh-add
elif [ "$SSH_AUTH_SOCK" ] && [ $agent_run_state = 1 ]; then
ssh-add
fi
unset env
Open ~/.ssh/config
using your preferred editor
(e.g. nano ~/.ssh/config
) and add the following (don’t
forget to replace your-ciment-username
with your
username):
# The CIMENT round-robin gateway
Host access-gricad
HostName access-gricad.univ-grenoble-alpes.fr
User your-ciment-username
IdentityFile ~/.ssh/id_ed25519
ServerAliveInterval 60
# Access CIMENT clusters via the gateway server
Host cargo dahu luke
User your-ciment-username
IdentityFile ~/.ssh/id_ed25519
ProxyJump access-gricad
This will route an SSH connection to any CIMENT cluster through the round-robin gateway server (bastion).
Type ssh dahu
to connect to the head node of
dahu
.
To access any other cluster, replace dahu
with the name
of the cluster (e.g. ssh luke
). You will need to authorize
your SSH key separately on each platform (if you followed these
instructions you are already authorized on dahu
and
luke
).
The first time you connect to the gateway or a cluster you may be warned that the host’s key fingerprint is not recognized:
or
This is fine; click Accept
or type Yes
to
continue. If you want you can first confirm that the gateway’s sha256
key fingerprint matches the one listed
here.
On Windows, the connection will only work if Pageant is running with your private key loaded.
If you get a proxy: Access denied
message it is likely
because Pageant is not running or hasn’t loaded your private key. You
can still connect to access-gricad
(you will be prompted to
type the password for your private key), and from there you may access
dahu
by typing ssh dahu
and entering your
CIMENT password.
If you want Pageant to start automatically when you log in to Windows you can add it to the list of startup programs:
More
, Open file location
shell:startup
, and
click OK
Please read the GRICAD documentation on data management as storage policies may change over time.
Unfortunately, the IAB Equipe 12 summer
volume cannot
currently be accessed from CIMENT (as of 2022-06-07).
Beware: CIMENT is not made for storing files! YOU are responsible for maintaining a backup of your code and data. The space for storing files in CIMENT is limited, and may be needed for other users’ computations. Please remove your files as soon as you no longer need them.
Files in your home (~
) will be deleted when your
PERSEUS account expires (the end of your contract). They are
also only accessible from the platform to which you are connected.
GRICAD recommends storing files in the project directory of a scratch volume:
/bettik/PROJECTS/your-project-name/your-username
(accessible from dahu
, bigfoot
, and
luke
)/silenius/PROJECTS/your-project-name/your-username
(accessible from dahu
and bigfoot
)
FILES ON SILENIUS ARE DELETED AFTER 30 DAYSAll project members can read files in these directories, but only you
can modify them. There is also a COMMON
directory within
each project directory that all members can modify.
Note that many small files (e.g. no more than a few KB each) in a
single directory degrades performance on bettik
. It is
better to combine these into a few large files. If you must use many
small files, prefer the silenius
volume.
Please check the GRICAD documentation for the latest recommendations.
If you need to transfer a large volume of data to or from CIMENT you
should use cargo.univ-grenoble-alpes.fr
. This is a special
machine that is reserved for data transfer. It provides access to:
dahu
:
/home/your-username
luke
: ~
or
/home-luke/your-username
/bettik
and /silenius
scratch
volumesConnect to this node as you would a cluster
e.g. ssh cargo
.
Windows users may use WinSCP to access files on
CIMENT via a graphical user interface. When installing WinSCP if you
have already configured PuTTY access to CIMENT you will see a message
“You have stored session/sites in Putty SSH client. Do you want to
import them in WinSCP?”. Click Yes
then OK
to
import the PuTTY settings.
To configure WinSCP to access cargo
, set the
Session
options to:
cargo
22
your-ciment-username
[blank]
Click Advanced...
and configure the
Connection
- Tunnel
options:
[checked]
access-gricad.univ-grenoble-alpes.fr
22
your-ciment-username
[blank]
[path to private key file]
Configure the SSH
- Authentication
options:
[checked]
[path to private key file]
Click OK
then Save
and save as
your-ciment-username@cargo
. Click Login
to
connect. You can now drag or copy / paste files between your local
machine (left pane) and cargo
(right pane).
Mac / Linux users may use command-line tools such as scp
or rsync
to transfer files to / from cargo
e.g.
scp ~/test.txt cargo:/dahu-home/your-ciment-username
For a graphical user interface, Mac users will need an SFTP client
such as Transmit that supports
the ssh command ProxyCommand
or ProxyJump
.
Ubuntu users should be able to browse files on cargo
in
their file manager by navigating to
sftp://your-ciment-username@cargo
.
Use Unix file permissions to limit other CIMENT users’ access to your
files. Note that the system administrators have access to
all files on CIMENT. You can view file permissions with
ls -Al
. For example:
ihough@f-dahu:~$ ls -Al
total 112
...
-rw------- 1 ihough l-iab 3525 Apr 16 09:22 .bashrc
drwxr-xr-x 5 ihough l-iab 4096 Jun 18 10:53 .cache
...
Here -rw------- ihough l-iab .bashrc
means the file is
only readable and writeable by the owner (ihough
) while
drwxr-xr-x ihough l-iab .cache
means the file is readable,
writeable, and executable by the owner (ihough
) and
readable and executable by any member of the group l-iab
or
any other user. See here
for a summary of Unix file permissions.
Change permissions with chmod
. For example,
chmod 750 code
or chmod o-rx code
would remove
read and execution permissions on code
for users other than
ihough
and the members of l-iab
:
ihough@f-dahu:~$ chmod o-x .cache
ihough@f-dahu:~$ ls -Al
...
drwxr-x--- 5 ihough l-iab 4096 Jun 18 10:53 .cache
...
You will need to install any software and dependencies needed for
your computations (e.g. R
). For most users, we recommend
using the conda package
management system. Advanced users may experiment with Nix, Guix, or
containers; see the GRICAD
documentation for further information.
To create a conda environment, connect to dahu
and run
the following:
source /applis/environments/conda.sh
conda create -n renv r-essentials
The first line configures your shell to use conda
. The
second line creates a conda environment named renv
and
installs the r-essentials
package which includes the latest
version of R and several common packages such as the tidyverse, plus all of their
dependencies.
If you need a specific version of R you can change the version of R
in the renv
environment:
conda install -n renv r-base=3.6.2
Conda will automatically determine what changes need to be made to ensure all dependencies are intercompatible. Alternatively, you can create a separate environment:
conda create -n r3.6.2 r-essentials r-base=3.6.2
This will leave the original renv
environment
unchanged.
To activate the conda environment, use
conda activate renv
You can now launch an R console with R
, or install
further software with conda install
.
You must activate your conda environment before using R every time you connect to a platform and in every job you run.
An easy way to do this is to create a file in your project directory
named oarinit.sh
(e.g. nano oarinit.sh
) with
the following:
#!/usr/bin/env bash
source /applis/environments/conda.sh
conda activate renv
YOU SHOULD NOT ACTIVATE CONDA IN YOUR
~/.bash_profile as this may have unintended consequences
e.g. interferring with data transfer via cargo
.
Now you can source oarinit.sh
to load the
renv
conda environment.
There are multiple ways to install R packages:
install.packages()
. You will be
prompted to choose a CRAN mirror the first time you run this; any mirror
should work.wget source-url
). Then run
R CMD INSTALL path-to-downloaded-tar-gz
renv
conda environment and
run conda install r-package-name
. This method is useful for
packages that have external dependencies (e.g. sf
which
requires gdal
, geos
, proj
, and
udunits2
).The CIMENT computing and storage resources are shared between many users. A resource manager is used to optimally allocate resources taking into account the needs of each computation (cores, memory, computation time, etc.) and the fairshare usage policy (users that have performed many calculations in the past three months have a lower priority).
Computation is forbidden on the head nodes of each platform (e.g. the
node you connect to with ssh dahu
). Light management tasks
are tolerated (e.g. viewing an output file, configuring a conda
environment), but any task that consumes too much CPU time will be
killed. All computation tasks must be scheduled using the OAR resource manager.
This section gives a basic overview of the main OAR commands. For more details see the GRICAD documentation.
All OAR commands start with oar
. To list the commands,
type oar
and hit the Tab
key twice. For usage
instructions, run any command with the -h
flag
e.g. oarsub -h
.
An OAR “job” consists of:
The resources you will most commonly specify are:
/nodes
= physical machines. If you are using R, you
should always request 1 node to ensure your allocated cores are on the
physical machine./core
= logical processing units. Each core can perform
a single task at a time.walltime
= the maximum runtime. If the computation has
not completed after this amount of time the job will be terminated (and
any unsaved work will be lost).There are three types of OAR jobs:
The oarsub
command allows you submit a new job. For
example
oarsub -I --project epimed -t devel -l /nodes=1/core=1,walltime=00:05:00
will submit an interactive job (-I
) belonging to the
‘epimed’ project. The job will run on the sandbox nodes
(-t devel
) and will be allocated a single core of a single
node for up to 5 minutes:
ihough@f-dahu:/bettik/ihough/code/P063.FR.PM$ oarsub -I --project epimed -t devel -l /nodes=1/core=1,walltime=00:05:00
[ADMISSION RULE] Modify resource description with type constraints
[DEVEL] Adding devel resource constraints
Import job key from file: /home/ihough/.ssh/id_rsa_for_oar
OAR_JOB_ID=9474983
Interactive mode: waiting...
Starting...
Connect to OAR job 9474983 via the node dahu34
ihough@dahu34:/bettik/ihough/code/P063.FR.PM$
Since this is an interactive job, we have been allocated an
interactive shell (here on the node dahu34
). We can type
any commands we wish in this shell and watch them execute. When we are
finished, we type exit
to end the job and return to the
head node.
Here is an example of a normal job:
ihough@f-dahu:~$ oarsub --project epimed -t devel -l /nodes=1,walltime=0:1:0 'echo Hello World!'
[ADMISSION RULE] Modify resource description with type constraints
[DEVEL] Adding devel resource constraints
Import job key from file: /home/ihough/.ssh/id_rsa_for_oar
OAR_JOB_ID=9475064
Since we asked for 1 node but not specify the number of cores this
job will be allocated an entire node (the dahu
sandbox
nodes have 32 cores). This is a non-interactive job, so the only output
is the job ID. Once the job starts, two files will be created in the
current directory named OAR.9475064.stdout
and
OAR.9475064.stderr
; the standard and error outputs of our
command, echo Hello World!
, are written to these files:
ihough@f-dahu:~$ cat OAR.9475064.stderr
ihough@f-dahu:~$ cat OAR.9475064.stdout
Hello World!
Note that the command must be executible and accessible from the
computation nodes. If you want to execute the shell script
my_script.sh
then you must first
chmod o+x my_script.sh
. If you want to run an R script then
you must first activate the conda environment in which we installed
R:
oarsub --project epimed -t devel -l /nodes=1/core=1,walltime=0:5:0 source oarinit.sh; Rscript my_script.R
To see a list of all running and scheduled jobs, use
oarstat
. To see only your jobs, use
oarstat -u
. To see full details of your jobs use
oarstat -fu
, or for a single job
oarstat -f <job_id>
.
ihough@f-dahu:~$ oarstat -u
Job id S User Duration System message
--------- - -------- ---------- ------------------------------------------------
9474983 R ihough 0:00:27 R=1,W=0:5:0,J=I,P=epimed,T=devel (Karma=-0.017,quota_ok)
To kill a job that is running use
oardel <job_id>
.
To view the platform status use chandler
.
Some monitoring tools (e.g. Gantt chart of scheduled jobs) are available at ciment-grid.univ-grenoble-alpes.fr.
Remember to activate your conda environment in your OAR jobs before trying to use any software it contains
Start a tmux terminal before running an interactive job:
tmux
oarsub -I --project epimed -l /nodes=1/core=1,walltime=1:0:0
This will keep the job running even if the ssh connection is closed
(e.g. b/c you lose internet). Reconnect with
tmux at
.
If using R, always request 1 node (R requires all cores be on the same node for memory sharing).
Don’t request more cores than you will use. For example, do not request an entire node if you only need a few cores. Do not request more than 1 core if your code does not use parallel processing.
However, if you need a large amount of memory, you must request
multiple cores. For example, most dahu
nodes have 6 GB of
RAM per core so if you need 64 GB RAM you must request 11 cores. Note
that R assumes you have access to a node’s entire memory but this is
only the case if you reserve the entire node. Your job will be
terminated if R tries to use more memory than the job has been
allocated. If you need more than 192 GB RAM you must use
luke41
(384 GB total; 16 GB / core).
The R function parallel::detectCores()
returns the
total number of cores on the node regardless of how many cores
your job has been allocated. You can get the number of
available cores by counting the number of lines in the OAR
nodefile:
return(length(readLines(Sys.getenv("OAR_NODEFILE"))))
On luke
, to require a node with at least 6 GB of RAM
per core use -p "memcore >= 6"
On luke
, to require the luke41
node use
-l "{network_address='luke41'}"
On dahu
, to require a node with a dedicated SSD use
-p "scratch1_type='dedicated_ssd'"
On dahu
, the SSD scratch volume is mounted at
/var/tmp
. The secondary HDD scratch (when present) is
mounted at /var/tmp2
Do not overload the head nodes. Use the dahu
sandbox
nodes for testing (jobs on the sandbox nodes are limited to 30 minutes
to ensure high availability). Use the cargo
node for data
transfer.
Try to set an accurate walltime. Good walltime estimates improve job scheduling, and jobs with a large walltime may wait a long time before launching. But it is also important to give your job enough time to complete as the computation will be wasted if the job is terminated before it finishes.
Every publication that includes results from computations performed on CIMENT clusters should include the following acknowledgment:
“[All / most / some] of the the computations presented in this paper were performed using the GRICAD infrastructure (https://gricad.univ-grenoble-alpes.fr), which is partly supported by the Equip@Meso project (reference ANR-10-EQPX-29-01) of the programme Investissements d’Avenir supervised by the Agence Nationale pour la Recherche.”
For citing specific GRICAD resources and citation updates please visit: https://ciment.ujf-grenoble.fr/wiki-pub/index.php/Cite_CIMENT