Cluster overview

Shared resource

Triton is a joint installation by a number of Aalto School of Science faculties within Science-IT project, which was founded in 2009 to facilitate the HPC Infrastructure in all of School of Science. It is now available to all Aalto researchers.

As of 2016, Triton is part of FGCI - Finnish Grid and Cloud Infrastructure (predecessor of Finnish Grid Infrastructure). Through the national grid and cloud infrastructure, Triton also becomes part of the European Grid Infrastructure.

Hardware

Different types of nodes:

  • 142 compute nodes HP SL390s G7, each equipped with 2x Intel Xeon X5650 2.67GHz (Westmere six-core each). 118 compute nodes cn[113-224], tb[003-008] have 48 GB of DDR3-1066 memory, others cn[225-248] have 96GB, each node has 4xQDR Infiniband port, cn[113-224], tb[003-008] have about 830 GB of local disk space (2 striped 7.2k SATA drives), while cn[225-248] about 380GB on single drive. 16 nodes have by two additional SATA drives.
  • 19 compute nodes gpu[1-19] are HP SL390s G7 for GPU computing. Same configuration as above but they are 2U high. See the GPU computing page for details about GPU cards in use.
  • 3 compute nodes gpu[20-22] are Dell PowerEdge C4130 for gpu computing. CPUs are 2x6 core Xeon E5 2620 v3 2.50GHz and memory configuration is 128GB DDR4-2133. There are 4x2 GPU K80 cards per node.
  • 2 fat nodes HP DL580 G7 4U, 4x Xeon, 6x SATA drives, 1TB of DDR3-1066 memory each and 4xQDR Infiniband port.
  • 48 compute nodes ivb[1-48] are HP SL230s G8 with 2x Xeon E5 2680 v2 10-core CPUs. First 24 nodes have 256 GB of DDR3-1667 memory and the other 24 are equipped with 64 GB.
  • 67 compute nodes Dell PowerEdge C4130 servers with 2x Xeon E5 2680 v3 2.5GHz 12-core CPUs. 51 nodes pe[1-48,65-67] have 128 GB of DDR4-2133 memory while 15 nodes pe[49-64] have 256GB.
  • 9 compute nodes pe[83-91] Dell PowerEdge C4130 with 2x Xeon X5 E5-2680 v4 2.4 GHz 14-core CPUs.
  • 5 compute nodes gpu[23-27] for gpu computing. CPUs are 2x12core Xeon E5-2680 v3 @ 2.50GHz and memory configuration 256GB DDR4-2400. There are 4x Tesla P100 16GB cards per node.
  • 2 Nvidia DGX-1 compute nodes for gpu computing. CPUs are 2x20 core Xeon E5-2698 v4 @ 2.2GHz and memory configuration 512GB DDR4-2133. There are 8x Tesla V100 16GB cards per node. These nodes have a special operating system, unlike the rest of the cluster, and thus these nodes require special considerations. See the DGX page.
Node name Number of nodes Node type Arch CPU type Memory Configuration GPUs
pe[1-48,65-81] 65 Dell PowerEdge C4130 hsw 2x12 core Xeon E5 2680 v3 2.50GHz 128GB DDR4-2133  
pe[49-64,82] 17 Dell PowerEdge C4130 hsw 2x12 core Xeon E5 2680 v3 2.50GHz 256GB DDR4-2133  
pe[84-91] 8 Dell PowerEdge C4130 bdw 2x14 core Xeon E5 2680 v4 2.40GHz 128GB DDR4-2400  
ivb[1-24] 24 ProLiant SL230s G8 ivb 2x10 core Xeon E5 2680 v2 2.80GHz 256GB DDR3-1667  
ivb[25-48] 24 ProLiant SL230s G8 ivb 2x10 core Xeon E5 2680 v2 2.80GHz 64GB DDR3-1667  
wsm[1-112] 112 ProLiant SL390s G7 wsm 2x6 core Intel Xeon X5650 2.67GHz 48GB DD3-1333  
wsm[113-136] 24 ProLiant SL390s G7 wsm 2x6 core Intel Xeon X5650 2.67GHz 96GB DD3-1333  
tb[007-009] 2 ProLiant SL390s G7 wsm 2x6 core Intel Xeon X5650 2.67GHz 48GB DD3-1333  
gpu[1-8] 8 ProLiant SL390s G7 wsm 2x6 core Intel Xeon X5650 2.67GHz 24GB DD3-1333 2x M2090
gpu[9-11] 3 ProLiant SL390s G7 wsm 2x6 core Intel Xeon X5650 2.67GHz 48GB DD3-1333 2x M2090
gpu[12-16] 5 ProLiant SL390s G7 wsm 2x6 core Intel Xeon X5650 2.67GHz 24GB DD3-1333 2x M2050
gpu[17-19] 3 ProLiant SL390s G7 wsm 2x6 core Intel Xeon X5650 2.67GHz 24GB DD3-1333 2x M2070
gpu[20-22] 3 Dell PowerEdge C4130 hsw 2x6 core Xeon E5 2620 v3 2.50GHz 128GB DDR4-2133 4x2 GPU K80
gpu[23-27] 5 Dell PowerEdge C4130 hsw 2x12 core Xeon E5-2680 v3 @ 2.5GHz 256GB DDR4-2400 4x P100
dgx[01-02] 2 Nvidia DGX-1 bdw 2x20 core Xeon E5-2698 v4 @ 2.2GHz 512GB DDR4-2133 8x V100

All Triton computing nodes are identical in respect to software and access to common file system. Each node has its own unique host name and ip-address.

Networking

The cluster has two internal networks: Infiniband for MPI and Lustre filesystem and Gigabit Ethernet for everything else like NFS /home directories and ssh.

The internal networks are unaccessible from outside. Only the login node triton.aalto.fi has an extra Ethernet connection to outside.

High performance InfiniBand has fat-tree configuration in general. Nodes wsm* are connected with ratio 2:1, thus for each 2 downlinks there is 1 uplink to spine switches. ivb[1-24], ivb[25-48], pe[1-48] and pe[49-67] have been connected with only 4 uplinks each, ratio 6:1, which are mainly used for Lustre communication. Running MPI jobs possible on the entire subsystem, but not across the cluster. Large MPI jobs may run on a particular segment only: either on wsm* or ivb[1-24] or ivb[25-48] or pe[1-48] or pe[49-67], but not across them.

See the IB topology map at cluster technical details page.

Disk arrays

All compute nodes and front-end are connected to DDN SFA12k storage system: large disk arrays with the Lustre filesystem on top of it cross-mounted under /scratch directory. The system provides about 1.8PB of disk space available to end-user.

Software

The cluster is running open source software infrastructure: CentOS 7, with SLURM as the scheduler and batch system.