The CINES offers public laboratories, offices and agencies, a powerful data storage service.
The datacentric architecture hosted at CINES tends to meet the users’ needs to share their data. This feature can be delivered to users thanks to a set of technical means presented to users as a huge disk space with good performances.
Moreover thoose software and hardware means allow to secure data that are store on this architecture. Thus, sharing thoose technical means offers a high securisation level to every project that have access to this infrastructure.
This architecture offers on one side a 2PB secured Lustre filesystem with a cumulated bandwidth of 50 GB/s, and on the other side CXFS filesystems through a NFS export.
Occigen accesses to this filesystem thanks to 12 Lustre gateways connected to an Infiniband FDR fabric. An NFS export is also provided to other machines connected to the 10 Gb/s Ethernet network.
A hierarchical storage management allows disk usage optimization. Files’ data is automatically moved on tapes in two distinct libraries after a given time. This hierarchical management offers two main benefits:
- disk space belonging to files that are not often accessed can be released because the file data is stored on tapes (note: file’s metadata, inode, remains on disk): the total space can increase dynamically by adding new tapes to the libraries,
- files that are frequently accessed stay on the Lustre disk space, as a result they gain good performances thanks to this distributed filesystem.
This hierarchical management is seamless for the user: the whole set of files is visible even if their data is on tape. However, user must be aware that latency may occur when accessing “old” files for which data may have been migrated on tapes.
Data is secured once a day on tapes. Copies are kept one week after a deletion occurs. Once the copy process is completed, we are able to restore an accidentally deleted file (if the retention period allows doing so).
Our data infrastructure is based on many items:
- a Lustre cluster (pictured in red below) : it’s the high performance filesystem that users access to (also known as « store »),
- a pDMF cluster and its libraries (pictured in orange below). The pDMF software suite is in charge of moving data between the different spaces and to export the filesystems using NFS,
- a RobinHood server connected to the Lustre cluster : this server monitors the Luster activity (file creation or deletion) and applies data migration policies.
The data management policies for the CINES filesystems are detailed on the “Data spaces” page.
Machines involved in the Lustre (right) and in the pDMF cluster (left)
The HSM Lustre cluster
This cluster consists of the following items:
- 2x DDN SFA12k controllers
- 420x 3TB SAS disks
- 2x MDS
- 12x OSS, 84 OST
- 12x LNET gateways
- An InfiniBand FDR fabric (56Gb)
- An HSM enabled Lustre
The pDMF and CXFS cluster
This cluster consists of the of the following items:
- 2x CXFS and DMF cluster with HA (High Availability)
- 3x DMF datamovers
- 2x NFS servers with a 10 GbE interface each
- 2x SGI IS5500 disk arrays (NetApp E5400)
- 120x 2TB SAS disks
- A Fibre Channel fabric based on two 8Gb switches
The Robinhood server
The server that runs robinhood software has:
- Intel Xeon E5-2620v2 processors (6x 2.1Ghz cores each)
- 128GB DDR3 memory
- 460GB SSD disks for the databases
- 1 TB SAS disks in a RAID1 for the system
- The RobinHood policy engine developed by the CEA
The IBM 3584 libraries
- Primary library : 8 Jaguar 4 drives [4TB by tape (300 tapes)] and 9 Jaguar 3 drives [1,6TB by tape (3150 tapes)], two accessors
- Secondary library : 6 LTO6 drives [2.5TB by tapes (1500 tapes)] and 10 LTO4 drives (800 Go by tape (2900 tapes)], a single accessor
Link to: cea-hpc/robinhood
These two libraries belong to a SAN dedicated to backup and data migration.