What is CIMENT?

CIMENT is a high-performance computational platform accessible to all staff members of the Univ. Grenoble Alpes institutes (and their students / invitees). CIMENT is part of GRICAD, the Grenoble Alps Research Infrastructure for Intensive Calculation And Data.

CIMENT consists of multiple platforms (also called clusters):

  • Dahu is a platform for mostly independent computations (little or no communication between nodes). It has 90+ nodes, most of which have 32 cores, 192 GB RAM, and 480 GB of SSD scratch storage. Runtime is limited to 48 hours per job.
  • Luke is a platform for jobs that do not fit well on dahu (e.g. runtime > 48 hours). It consists of 63 heterogeneous nodes, most of which are reserved for specific projects. Runtime is limited to 240 hours per job. Two nodes are reserved for the epimed project:
    • luke41: 24 cores, 384 GB RAM, 55 TB HDD
    • luke61: 32 cores, 196 GB RAM
  • Bigfoot is a platform for visualization and computations such as deep learning that benefit from GPU acceleration. It consits of 3 nodes in the dahu cluster, each with 32 cores, 4 GPUs, 192 GB RAM, and a 200 GB system SSD. Runtime is limited to 48 hours per job.
  • CiGri is a grid for “bag-of-tasks” jobs (many computations with varying parameters). The computations are performed by idle cores of the other platforms in “best-effort” mode: when a normal job requests the core the CiGri job is killed, then relaunched as soon as a core becomes available. Total runtime is unlimited, and CiGri jobs may use cores on any node, including luke nodes that are reserved for a specific project.

Note that memory (RAM) is split equally between all cores on a node. For example on a 32-core dahu node each core has 6 GB of RAM. The node with the most RAM is luke41 (384 GB; 16 GB / core).

All nodes run on Debian GNU/Linux.

This guide focuses on dahu but the principles apply for the other clusters. You should use dahu unless you know another platform is more appropriate for your work.

Getting help

The GRICAD documentation:

You can contact the GRICAD helpdesk at or create a ticket via the web interface.

For a brief introduction to Unix, see this software carpentry course or this GRICAD course (in French).

GRICAD also offers doctoral-level courses (in French) on Linux, scientific programming, and parallel computation.

Florent Chuffart is the CIMENT users committee representative for the IAB.

Registering to use CIMENT

Create a user account

To use CIMENT, you first need to create an account on PERSEUS:

  • Click “Create an account”
  • Click “CREATE ACCOUNT FROM AGALAN”
  • Enter your AGALAN username/password (same credentials as for UGA email)
  • Follow the instructions

Join a project

All computation on CIMENT must be allocated to a project. This allows fair sharing of the resources according to each project’s past usage of and contribution to the CIMENT infrastructure. To join a project:

  • Log in to your PERSEUS account
  • Click My projects
  • Click Join a project
  • Select a project from the list. Ask your supervisor if you are unsure which project to choose; it will likely be one of
    • pr-test: "Test project with 3000 hours quota" (allows 3 months of access and up to 3000 hours of usage)
    • pr-epimed: "EPIgenetique MEDicale" (the EPIMED project)
    • one of the team’s cohort projects

If you are a permanent staff member, you may create a new project by contacting the GRICAD staff. Projects must be renewed each year by preparing a short activity report.

Connecting to CIMENT

The CIMENT platforms are only accessible by SSH via a gateway server (bastion). For more details, see the GRICAD documentation. We will configure your SSH client to transparently connect via the gateway using an SSH key pair. This avoids having to repeatedly type your password.

Windows users

Windows users must obtain an SSH client. If you are familiar with git bash, cygwin, or Windows Subsystem for Linux you may follow the instructions for Mac / Linux users to configure OpenSSH via the terminal. Otherwise, follow these instructions to configure PuTTY.

Create an SSH key pair

Install PuTTY (an an open-source SSH client for Windows). Use the default installation settings.

Open PuTTYgen (in same folder as PuTTY) and click Generate.

Enter a password in the Key passphrase and Confirm passphrase boxes. You will need this password to unlock the key. You may also choose a Key comment to help you identify the key (e.g. username@computer-name).

Save the private key as a file named id_rsa.ppk in a folder named .ssh within your home directory (i.e. %USERPROFILE%\.ssh\id_rsa.ppk).

Leave the PuTTYgen window open; we will return to it later to copy the public key.

Add your SSH key to Pageant

Launch Pageant (also in the same folder as PuTTY). An icon will appear in your system tray: . Right-click the icon and choose Add Key. Select the private key file you saved (id_rsa.ppk), enter the key password, and click OK. Your private key is now loaded and available for PuTTY to use.

Configure the gateway

Open PuTTY and configure the Session options to connect to the round-robin gateway:

  • Host Name (or IP address): access-gricad.univ-grenoble-alpes.fr
  • Port: 22
  • Connection type: SSH

Configure the Connection - Data options:

  • Auto-login username: your-agalan-username

Configure the Connection - SSH - Auth options:

  • Attempt authentication using Pageant: [checked]
  • Private key file for authentication: [path to private key file]

Save the configuration:

  • Return to the Session options
  • Type access-gricad in the Saved Sessions box and click Save

Authorize your SSH key on the bastions

We will now authorize your ssh key on the two bastions (rotule and trinity). The access-gricad round-robin gateway automatically routes your connection through whichever bastion has the least traffic, so you should authorize your key on both of them.

In the PuTTY window Session options, change the Host Name to rotule.univ-grenoble-alpes-alpes.fr.

Click Open to connect.

You will be warned that the host’s key fingerprint is not recognized:

Click Accept to continue. If you want you can first confirm that the gateway’s sha256 key fingerprint matches the one listed here.

Enter your password at the prompt.

In the PuTTYgen window, copy the entire contents of the Public key for pasting into OpenSSH authorized_keys file window. The copied text should end with == followed by the key comment.

In the PuTTY terminal type echo, right click or press SHIFT + INSERT to paste the text of your public key, and type >> ~/.ssh/authorized_keys. The entire command should look like

echo public-key-text== key-comment >> ~/.ssh/authorized_keys`

This will append your public key to the list of keys that are authorized to access the bastion.

Type exit to close the connection.

Reopen PuTTY and load the access-gricad connection settings. Change the Host Name to trinity.univ-grenoble-alpes-alpes.fr and repeat the above steps to authorize your SSH key on the trinity bastion.

Authorize your SSH key on the platforms

We will now authorize your ssh key on the computing platforms themselves.

Open PuTTY and again load the access-gricad connection settings. Click Open to connect. You should not need to type a password.

Type ssh dahu to connect to the dahu head node (you will need to type your password) and add your public key to the authorized_keys file:

echo public-key-text== key-comment >> ~/.ssh/authorized_keys

Type exit to close the connection to dahu.

Type ssh luke to connect to the luke head node (you will again need to type your password) and add your public key to the authorized_keys file using the same command as for dahu.

Type exit to close the connection to luke and exit again to close the connection to the gateway. You may also close the PuTTYgen window.

Configure the connection to dahu

Finally, we will configure a connection to the dahu platform via the round-robin gateway.

Open PuTTY and load the saved access-gricad connection. In the Session options, change the Host Name to dahu. Then configure the Connection - Proxy options to route the connection to dahu through the gateway server:

  • Proxy type: Local
  • Proxy hostname: access-gricad.univ-grenoble-alpes.fr
  • Port: 22
  • Username: your-ciment-username
  • Telnet command, or local proxy command: plink.exe %user@%proxyhost -nc %host:%port
  • Print proxy diagnostics in the terminal window: Yes

Return to the Session options and save the configuration as dahu

Click Open to connect to the dahu cluster. You should not be prompted for your password.

Connecting to other platforms

To connect to another platform (e.g. luke), simply change dahu to the name of the platform in the PuTTY Session configuration. You will need to authorize your SSH key separately on each platform (if you followed these instructions you are already authorized on dahu and luke.

Mac / Linux users

Mac / Linux users should use their preferred terminal to configure OpenSSH as follows.

Create an SSH key pair

[If you already have a private key (e.g. ~/.ssh/id_ed25519 or ~/.ssh/id_rsa) you may skip this step.]

Open your terminal and type (replace username and computer-name with your username and local computer name; these are to help you identify the key)

ssh-keygen -t ed25519 -C "username@computer-name"

Follow the prompts to save as ~/.ssh/id_ed25519 and create a password.

Add your key to the SSH agent

ssh-add ~/.ssh/id_ed25519

Enter your password when prompted.

If you get an error Could not open a connection to your authentication agent. then the SSH agent is not running. Start it with ssh-agent. You may also wish to add the following to your ~/.bash_profile to automatically start the agent and add your keys when you open a terminal window:

# Start ssh-agent and add keys on login
env=~/.ssh/agent.env

agent_load_env () { test -f "$env" && . "$env" >| /dev/null ; }

agent_start () {
    (umask 077; ssh-agent >| "$env")
    . "$env" >| /dev/null ; }

agent_load_env

# agent_run_state: 0=agent running w/ key; 1=agent w/o key; 2= agent not running
agent_run_state=$(ssh-add -l >| /dev/null 2>&1; echo $?)

if [ ! "$SSH_AUTH_SOCK" ] || [ $agent_run_state = 2 ]; then
    agent_start
    ssh-add
elif [ "$SSH_AUTH_SOCK" ] && [ $agent_run_state = 1 ]; then
    ssh-add
fi

unset env

Configure OpenSSH

Open ~/.ssh/config using your preferred editor (e.g. nano ~/.ssh/config) and add the following (don’t forget to replace your-ciment-username with your username):

# The CIMENT round-robin gateway
Host access-gricad
  HostName access-gricad.univ-grenoble-alpes.fr
  User your-ciment-username
  IdentityFile ~/.ssh/id_ed25519
  ServerAliveInterval 60

# Access CIMENT clusters via the gateway server
Host cargo dahu luke
  User your-ciment-username
  IdentityFile ~/.ssh/id_ed25519
  ProxyJump access-gricad

This will route an SSH connection to any CIMENT cluster through the round-robin gateway server (bastion).

Authorize your SSH key

First authorize your key on each of the two bastions (enter your password when prompted):

ssh-copy-id your-ciment-username@rotule.univ-grenoble-alpes-alpes.fr
ssh-copy-id your-ciment-username@trinity.univ-grenoble-alpes-alpes.fr

Then, authorize your key on dahu and luke (again enter your password when prompted):

ssh-copy-id your-ciment-username@dahu
ssh-copy-id your-ciment-username@luke

Test the connection

Type ssh dahu to connect to the head node of dahu.

Connecting to other platforms

To access any other cluster, replace dahu with the name of the cluster (e.g. ssh luke). You will need to authorize your SSH key separately on each platform (if you followed these instructions you are already authorized on dahu and luke).

Troubleshooting

The first time you connect to the gateway or a cluster you may be warned that the host’s key fingerprint is not recognized:

or

This is fine; click Accept or type Yes to continue. If you want you can first confirm that the gateway’s sha256 key fingerprint matches the one listed here.

Windows

On Windows, the connection will only work if Pageant is running with your private key loaded.

If you get a proxy: Access denied message it is likely because Pageant is not running or hasn’t loaded your private key. You can still connect to access-gricad (you will be prompted to type the password for your private key), and from there you may access dahu by typing ssh dahu and entering your CIMENT password.

If you want Pageant to start automatically when you log in to Windows you can add it to the list of startup programs:

  • Find Pageant in the start menu, right click, select More, Open file location
  • Press the Windows key + R, type shell:startup, and click OK
  • Copy and paste the shortcut to Pageant from its folder to the startup folder

Data management

Please read the GRICAD documentation on data management as storage policies may change over time.

Unfortunately, the IAB Equipe 12 summer volume cannot currently be accessed from CIMENT (as of 2022-06-07).

Beware: CIMENT is not made for storing files! YOU are responsible for maintaining a backup of your code and data. The space for storing files in CIMENT is limited, and may be needed for other users’ computations. Please remove your files as soon as you no longer need them.

Files in your home (~) will be deleted when your PERSEUS account expires (the end of your contract). They are also only accessible from the platform to which you are connected.

GRICAD recommends storing files in the project directory of a scratch volume:

  • /bettik/PROJECTS/your-project-name/your-username (accessible from dahu, bigfoot, and luke)
  • /silenius/PROJECTS/your-project-name/your-username (accessible from dahu and bigfoot) FILES ON SILENIUS ARE DELETED AFTER 30 DAYS

All project members can read files in these directories, but only you can modify them. There is also a COMMON directory within each project directory that all members can modify.

Note that many small files (e.g. no more than a few KB each) in a single directory degrades performance on bettik. It is better to combine these into a few large files. If you must use many small files, prefer the silenius volume.

Data transfer

Please check the GRICAD documentation for the latest recommendations.

If you need to transfer a large volume of data to or from CIMENT you should use cargo.univ-grenoble-alpes.fr. This is a special machine that is reserved for data transfer. It provides access to:

  • Your home on dahu: /home/your-username
  • Your home on luke: ~ or /home-luke/your-username
  • The /bettik and /silenius scratch volumes

Connect to this node as you would a cluster e.g. ssh cargo.

Windows users

Windows users may use WinSCP to access files on CIMENT via a graphical user interface. When installing WinSCP if you have already configured PuTTY access to CIMENT you will see a message “You have stored session/sites in Putty SSH client. Do you want to import them in WinSCP?”. Click Yes then OK to import the PuTTY settings.

To configure WinSCP to access cargo, set the Session options to:

  • Host name: cargo
  • Port number: 22
  • User name: your-ciment-username
  • Password: [blank]

Click Advanced... and configure the Connection - Tunnel options:

  • Connect through SSH tunnel: [checked]
  • Host name: access-gricad.univ-grenoble-alpes.fr
  • Port number: 22
  • User name: your-ciment-username
  • Password: [blank]
  • Private key file: [path to private key file]

Configure the SSH - Authentication options:

  • Attempt authentication using Pageant: [checked]
  • Private key file: [path to private key file]

Click OK then Save and save as your-ciment-username@cargo. Click Login to connect. You can now drag or copy / paste files between your local machine (left pane) and cargo (right pane).

Mac / Linux users

Mac / Linux users may use command-line tools such as scp or rsync to transfer files to / from cargo e.g.

scp ~/test.txt cargo:/dahu-home/your-ciment-username

For a graphical user interface, Mac users will need an SFTP client such as Transmit that supports the ssh command ProxyCommand or ProxyJump. Ubuntu users should be able to browse files on cargo in their file manager by navigating to sftp://your-ciment-username@cargo.

File permissions

Use Unix file permissions to limit other CIMENT users’ access to your files. Note that the system administrators have access to all files on CIMENT. You can view file permissions with ls -Al. For example:

ihough@f-dahu:~$ ls -Al
total 112
...
-rw------- 1 ihough l-iab  3525 Apr 16 09:22 .bashrc
drwxr-xr-x 5 ihough l-iab  4096 Jun 18 10:53 .cache
...

Here -rw------- ihough l-iab .bashrc means the file is only readable and writeable by the owner (ihough) while drwxr-xr-x ihough l-iab .cache means the file is readable, writeable, and executable by the owner (ihough) and readable and executable by any member of the group l-iab or any other user. See here for a summary of Unix file permissions.

Change permissions with chmod. For example, chmod 750 code or chmod o-rx code would remove read and execution permissions on code for users other than ihough and the members of l-iab:

ihough@f-dahu:~$ chmod o-x .cache
ihough@f-dahu:~$ ls -Al
...
drwxr-x--- 5 ihough l-iab  4096 Jun 18 10:53 .cache
...

Software environment

You will need to install any software and dependencies needed for your computations (e.g. R). For most users, we recommend using the conda package management system. Advanced users may experiment with Nix, Guix, or containers; see the GRICAD documentation for further information.

Conda setup

To create a conda environment, connect to dahu and run the following:

source /applis/environments/conda.sh
conda create -n renv r-essentials

The first line configures your shell to use conda. The second line creates a conda environment named renv and installs the r-essentials package which includes the latest version of R and several common packages such as the tidyverse, plus all of their dependencies.

If you need a specific version of R you can change the version of R in the renv environment:

conda install -n renv r-base=3.6.2

Conda will automatically determine what changes need to be made to ensure all dependencies are intercompatible. Alternatively, you can create a separate environment:

conda create -n r3.6.2 r-essentials r-base=3.6.2

This will leave the original renv environment unchanged.

Using the conda environment

To activate the conda environment, use

conda activate renv

You can now launch an R console with R, or install further software with conda install.

You must activate your conda environment before using R every time you connect to a platform and in every job you run.

An easy way to do this is to create a file in your project directory named oarinit.sh (e.g. nano oarinit.sh) with the following:

#!/usr/bin/env bash

source /applis/environments/conda.sh
conda activate renv

YOU SHOULD NOT ACTIVATE CONDA IN YOUR ~/.bash_profile as this may have unintended consequences e.g. interferring with data transfer via cargo.

Now you can source oarinit.sh to load the renv conda environment.

Installing R packages

There are multiple ways to install R packages:

  1. In an R console: use install.packages(). You will be prompted to choose a CRAN mirror the first time you run this; any mirror should work.
  2. From the command line: download the source of the package (e.g. wget source-url). Then run R CMD INSTALL path-to-downloaded-tar-gz
  3. Using conda: activate the renv conda environment and run conda install r-package-name. This method is useful for packages that have external dependencies (e.g. sf which requires gdal, geos, proj, and udunits2).

Performing computations

The CIMENT computing and storage resources are shared between many users. A resource manager is used to optimally allocate resources taking into account the needs of each computation (cores, memory, computation time, etc.) and the fairshare usage policy (users that have performed many calculations in the past three months have a lower priority).

Computation is forbidden on the head nodes of each platform (e.g. the node you connect to with ssh dahu). Light management tasks are tolerated (e.g. viewing an output file, configuring a conda environment), but any task that consumes too much CPU time will be killed. All computation tasks must be scheduled using the OAR resource manager.

Using OAR

This section gives a basic overview of the main OAR commands. For more details see the GRICAD documentation.

All OAR commands start with oar. To list the commands, type oar and hit the Tab key twice. For usage instructions, run any command with the -h flag e.g. oarsub -h.

An OAR “job” consists of:

  • A description of the resources needed
  • The commands to execute the computation
  • The project to which the job should be allocated

The resources you will most commonly specify are:

  • /nodes = physical machines. If you are using R, you should always request 1 node to ensure your allocated cores are on the physical machine.
  • /core = logical processing units. Each core can perform a single task at a time.
  • walltime = the maximum runtime. If the computation has not completed after this amount of time the job will be terminated (and any unsaved work will be lost).

There are three types of OAR jobs:

  • Normal jobs perform execute a command or script and then exit
  • Interactive jobs allow you to run commands in an interactive shell
  • Best-effort jobs run only when there are cores not allocated to a normal or interactive job

The oarsub command allows you submit a new job. For example

oarsub -I --project epimed -t devel -l /nodes=1/core=1,walltime=00:05:00

will submit an interactive job (-I) belonging to the ‘epimed’ project. The job will run on the sandbox nodes (-t devel) and will be allocated a single core of a single node for up to 5 minutes:

ihough@f-dahu:/bettik/ihough/code/P063.FR.PM$ oarsub -I --project epimed -t devel -l /nodes=1/core=1,walltime=00:05:00
[ADMISSION RULE] Modify resource description with type constraints
[DEVEL] Adding devel resource constraints
Import job key from file: /home/ihough/.ssh/id_rsa_for_oar
OAR_JOB_ID=9474983
Interactive mode: waiting...
Starting...

Connect to OAR job 9474983 via the node dahu34
ihough@dahu34:/bettik/ihough/code/P063.FR.PM$

Since this is an interactive job, we have been allocated an interactive shell (here on the node dahu34). We can type any commands we wish in this shell and watch them execute. When we are finished, we type exit to end the job and return to the head node.

Here is an example of a normal job:

ihough@f-dahu:~$ oarsub --project epimed -t devel -l /nodes=1,walltime=0:1:0 'echo Hello World!'
[ADMISSION RULE] Modify resource description with type constraints
[DEVEL] Adding devel resource constraints
Import job key from file: /home/ihough/.ssh/id_rsa_for_oar
OAR_JOB_ID=9475064

Since we asked for 1 node but not specify the number of cores this job will be allocated an entire node (the dahu sandbox nodes have 32 cores). This is a non-interactive job, so the only output is the job ID. Once the job starts, two files will be created in the current directory named OAR.9475064.stdout and OAR.9475064.stderr; the standard and error outputs of our command, echo Hello World!, are written to these files:

ihough@f-dahu:~$ cat OAR.9475064.stderr
ihough@f-dahu:~$ cat OAR.9475064.stdout
Hello World!

Note that the command must be executible and accessible from the computation nodes. If you want to execute the shell script my_script.sh then you must first chmod o+x my_script.sh. If you want to run an R script then you must first activate the conda environment in which we installed R:

oarsub --project epimed -t devel -l /nodes=1/core=1,walltime=0:5:0 source oarinit.sh; Rscript my_script.R

To see a list of all running and scheduled jobs, use oarstat. To see only your jobs, use oarstat -u. To see full details of your jobs use oarstat -fu, or for a single job oarstat -f <job_id>.

ihough@f-dahu:~$ oarstat -u
Job id    S User     Duration   System message
--------- - -------- ---------- ------------------------------------------------
9474983   R ihough      0:00:27 R=1,W=0:5:0,J=I,P=epimed,T=devel (Karma=-0.017,quota_ok)

To kill a job that is running use oardel <job_id>.

To view the platform status use chandler.

Some monitoring tools (e.g. Gantt chart of scheduled jobs) are available at ciment-grid.univ-grenoble-alpes.fr.

OAR tips and best practices

  • Remember to activate your conda environment in your OAR jobs before trying to use any software it contains

  • Start a tmux terminal before running an interactive job:

    tmux
    oarsub -I --project epimed -l /nodes=1/core=1,walltime=1:0:0

    This will keep the job running even if the ssh connection is closed (e.g. b/c you lose internet). Reconnect with tmux at.

  • If using R, always request 1 node (R requires all cores be on the same node for memory sharing).

  • Don’t request more cores than you will use. For example, do not request an entire node if you only need a few cores. Do not request more than 1 core if your code does not use parallel processing.

  • However, if you need a large amount of memory, you must request multiple cores. For example, most dahu nodes have 6 GB of RAM per core so if you need 64 GB RAM you must request 11 cores. Note that R assumes you have access to a node’s entire memory but this is only the case if you reserve the entire node. Your job will be terminated if R tries to use more memory than the job has been allocated. If you need more than 192 GB RAM you must use luke41 (384 GB total; 16 GB / core).

  • The R function parallel::detectCores() returns the total number of cores on the node regardless of how many cores your job has been allocated. You can get the number of available cores by counting the number of lines in the OAR nodefile:

    return(length(readLines(Sys.getenv("OAR_NODEFILE"))))
  • On luke, to require a node with at least 6 GB of RAM per core use -p "memcore >= 6"

  • On luke, to require the luke41 node use -l "{network_address='luke41'}"

  • On dahu, to require a node with a dedicated SSD use -p "scratch1_type='dedicated_ssd'"

  • On dahu, the SSD scratch volume is mounted at /var/tmp. The secondary HDD scratch (when present) is mounted at /var/tmp2

  • Do not overload the head nodes. Use the dahu sandbox nodes for testing (jobs on the sandbox nodes are limited to 30 minutes to ensure high availability). Use the cargo node for data transfer.

  • Try to set an accurate walltime. Good walltime estimates improve job scheduling, and jobs with a large walltime may wait a long time before launching. But it is also important to give your job enough time to complete as the computation will be wasted if the job is terminated before it finishes.

Citing CIMENT

Every publication that includes results from computations performed on CIMENT clusters should include the following acknowledgment:

“[All / most / some] of the the computations presented in this paper were performed using the GRICAD infrastructure (https://gricad.univ-grenoble-alpes.fr), which is partly supported by the project (reference ANR-10-EQPX-29-01) of the programme Investissements d’Avenir supervised by the Agence Nationale pour la Recherche.”

For citing specific GRICAD resources and citation updates please visit: https://ciment.ujf-grenoble.fr/wiki-pub/index.php/Cite_CIMENT