File spaces available at CINES
Space name | Quota (default) ¹ | Target of quota |
Accessible from | Usage | Aggregated performance |
Backup |
/home | 100 GB | user | login nodes + compute nodes | Dedicated to application sources, binaries and librairies needed for job execution | ~10 GB/s | yes (once a day) |
20 000 files ² |
||||||
/scratch | 20 TB | group | login nodes + compute nodes | Temporary work space : read/write of temporary files and/or large data sets during job processing | ~100 GB/s | no |
200 000 files ² |
||||||
/store ³ | 1 TB | group | login nodes + compute nodes | Secured storage space for large data sets |
~50 GB/s ³ | yes |
10 000 files |
(¹) “hard” limit. To warrant file system stability and performance. These limits can be adjusted on substantiated request.
(²) Quotas activated since 16/11/2017
(³) Space based on a Hierarchical storage Management System (HSM) using disks and tapes to store data. Some old-files access could lead to high latency due to data recovery from tape.
Disk quotas
Default limits are defined to fit most of the projects needs.
CINES keeps listening to user’s specific needs to find the appropriate limits to set for their project.
/store space (standard quota mechanism)
The main limit constraint of /store quota concerns the number of files (default limit: 10 000). This limit guarantees that the secure process can be run often and quickly. That’s why it is so important to lower the file number by making archives, using the “tar” command.
An initial quota of 1 TB is also set on the data volume (this limit can evolve according to the needs). Those limits are set for a unix group. If a user is over quota, other users belonging to the same group will not be able to create files anymore.
Every user may have a look on its group usage by reading the file: « /store/[CT]/[group]/.occupation_store »
/home et /scratch spaces (CINES specific quota mechanism)
Two limits, “soft” et “hard”, are set both on the amount of used space and the number of files (See table above)
- When a “soft” limit is reached :
- a warning email is sent, advising that the projects is about reaching the “hard” limit.
- When a “hard” limit is reached : see table above to determine whether a user or all people of his group are concerned by the following points :
- Currently running jobs are not affected and can run until their normal completion.
- New job submission is unauthorized.
- Queuing jobs are held.
- An email is sent advising the user which “hard” limit is reached as well as the consequences for his work and potential actions to take in order to solve the problem.
It is possible to identify held jobs seeing partition set : BLOCKED_xxx (command : squeue $USER).
superman@login0:~$ squeue $USER JOBID PARTITION NAME USER ST TIME NODES NODELIST (REASON) 351007 BLOCKED_H job3 superman PD 0:00 1 (PartitionDown) 351009 BLOCKED_H job4 superman PD 0:00 1 (PartitionDown) 354005 all job2 superman R 8:29:17 1 n2957 354006 all job1 superman R 8:29:17 1 n3004
Your current usage
To state the current usage and limits at any time for each data space, you can use the command etat_projet .
Exemple of the command output :
Etat des consommations du projet abc1234 sur le cluster OCCIGEN (au 06/10/2017 00:00:00)
-----------------------------------------------------------------
Allocation : 530000 heures
Consommation : 195772 heures soit 36% de votre allocation (sur une base de 12 mois)
-----------------------------------------------------------------
Allocation bonus : 106000 heures
Consommation bonus: 0 heures soit 0% de votre allocation (sur une base de 12 mois)
-----------------------------------------------------------------
Etat des systemes de fichiers de l'utilisateur superman (abc1234) (au 06/10/2017 10:11:53)
--------------------------------------------------------------------------
| Filesystem | Mode | Type | ID | Usage | Limite | Unite |
--------------------------------------------------------------------------
| /home | Space | User | superman | 40.61 | 40.00 | GB | depassement
| |------------------------------------------------------------
| | Files | User | superman | 5211 | 10000 | Files |
--------------------------------------------------------------------------
| /scratch | Space | Group | abc1234 | 760 | 4000 | GB |
| |------------------------------------------------------------
| | Files | Group | abc1234 | 26221 | 200000 | Files |
--------------------------------------------------------------------------
| /store | Space | Group | abc1234 |D 308 | 1907 | GB |
| |------------------------------------------------------------
| | Space | Group | abc1234 |D+T 2232 | * | GB | Fri Oct 6 02:00:13 CEST 2017
| |------------------------------------------------------------
| | Files | Group | abc1234 | 29712 | 30000 | Files |
--------------------------------------------------------------------------
For /store every user may have a look on its group usage by reading the file: « /store/[CT]/[group]/.occupation_store »
Space management in Lustre :
Occigen uses a Lustre file system for its $SCRATCHDIR. In order to improve file access performance, it is possible to use “striping”. This technique of “striping” files can be used to improve the performance of serial I/O code from a single node or parallel I/O from multiple nodes writing to a single shared file as with MPI-IO, parallel HDF5 or parallel NetCDF.
The Lustre file system consists of a set of Object Storage Servers (OSS) and I/O disks called Object Storage Targets (OST).
A file is “striped” when read and write operations simultaneously access multiple OSTs. Striping files is a means of increasing I/O performance since read or write access to several OSTs increases the available I/O bandwidth.
There are some commands to take advantage of this effective way of accessing data :
lfs getstripe <filename/directory> lfs setstripe -c <number of "stripes"> <filename/directory>
Best practices
To manage your disk spaces we encourage you to perform regulary these simple actions :
- remove unecessary files (/home, /scratch, /store);
- /scratch working space must contain temporary files. You have to move files to be preserved and/or no more required for calculation to one of the /store data space;
- Create tarballs of your results and move them to /store (required by HSM mechanisms (pointeur sur page web externe) sous-jacents);
- Move files to your local site (e.g. laboratory) (scp,…);
- /store is not suited to computation (batch jobs), it must not be used in this context. For details, please consult this link (pointeur sur la page web du CINES qui détaille la robotique) or contact Mel svp.
WARNING ! Access (tar, scp, sftp, cat, edit, , . . .) to old files on /store could lead to high latency due to data recovery from tape(pointeur sur la page web du CINES qui détaille la robotique)
Files recovery
on /store
Files on /store are automatically duplicated on tape once a day. Hence, in case of lost or erroneous erase of files, they can be recovered within 10 days after the event.
Please contact Mel svp
on /home
Files on /home are automatically duplicated once a day. Hence, in case of lost or erroneous erase of files, they can be recovered within 10 days after the event.
Please contact Mel svp