Configuration

Occigen includes 34 racks :

  • 27 racks for HPC (cf. description architecture)
  • 7 racks for connexions nodes, services nodes et disks nodes.

The cluster is composed of three “parts” :

  • the first part which has 50544 cores distributed over 2106 nodes
  • the second part which has 35280 cores distributed over 1260 nodes
  • the third part which has 224 cores in one node

 

Part 1 Part 2 Visualization Large
Manufacturer ref. Bull B720 Bull B720 Bull R421-E4 Bull Sequana X800
Processor name Haswell Broadwell Broadwell Skylake
Processor ref. E5-2690V3@2.6 GHz E5-2690 V4@2.6GHz E5-2690 V4@2.6GHz Xéon Platinum 8176@2.1GHz
Nb of nodes 2106 1260 4 1
Processors by node 2 2 2 8
Freq. of processors 2.6 GHz 2.6 GHz 2.6 GHz 2.1 GHz
Cores by processor 12 14 14 28
Cache size L1

12 X 32 KB instr.

12 X 32 KB data

14 X 32 KB instr.

14 X 32 KB data

14 X 32 KB instr.

14 X 32 KB data

28 X 32 KB instr.

28 X 32 KB data

Cache size L2 12 X 256 KB 14 X 256 KB 14 X 256 KB 28 X 1 MB
Cache size LLC 30 MB 35 MB 35 MB 38.5 MB
Nb of memory channels 4 4 4 6
Memory by node

1053 X 64 GB

1053 X 128 GB

64 GB 256 GB 3 TB
Memory ref. DDR4-2133P-R DDR4-2400T-R DDR4-2400T-R DDR4-2666V-R
Network attachment Infiniband FDR 56 Gb/s Infiniband FDR 56 Gb/s Infiniband FDR 56 Gb/s Infiniband FDR 56 Gb/s
Type of GPU Nvidia Tesla P100 PCIe 12Go Nvidia Tesla P100 PCIe 12Go
Nb GPU by node 1 2
Nb total cores 50544 35280 112 224

 

DSC_9365

The machine Bull ” Occigen “ at CINES

The computings racks are connected to five racks mounted on a Lustre shared filesystem with a total capacity of 5 PB useful.

Cooling is provided by a high efficiency warm water system directly in the nodes (mode DLC Direct Liquid Cooling).

The request for allocation of computing hours on this cluster is the subject of two campaigns per year (autumn and spring) through the DARI procedure.

Architecture of Occigen

Description

The machine is cut into a rack. A rack includes :

  • A power supply unit for power supplies
  • A block containing the redundant distribution services of the internal warm water circuit.
  • 5 chassis. Each chassis is composed of 9 computing blades. These blades contain 2 nodes.
  • The processors of the connection nodes are Intel 12-Cores E5-2690 V3.

In total, the Occigen cluster consists of 3367 compute nodes and therefore has 86048 cores.

The network used to connect the nodes to each other is an Infiniband network (IB 4x FDR). The topology of the network is in the form of a Fat tree pruned. The network is non blocking within the chassis. Each group of 18 nodes that share the same switch in a chassis can be reached without restrictions. The “ascending” links from a switch chassis to the higher-order switches are divided by two. For 18 nodes, only 9 uplinks are used.

There are two types of filesystems :

  • The /scratch (used to store the results of the calculations is of the Lustre type. It has more than 5 PB of usable space and a maximum bandwidth that exceeds 105 GB/s.
  • The /home is a Panasas type, it is used to store the codes to be executed, it has a surface area of 260 TB and a bandwidth of 10 GB/s.

To store the results more securely, each node of the machine accesses the /store file system. This is also a Lustre filesystem, but with advanced security mechanisms (duplicate storage and tape storage). It must be used to ensure that the results are properly stored.

Dernière modification le : 19 February 2019
CINES