Cluster technical overview

Shared resource

Triton is a joint installation by a number of Aalto School of Science faculties within Science-IT project, which was founded in 2009 to facilitate the HPC Infrastructure in all of School of Science. It is now available to all Aalto researchers.

As of 2016, Triton is part of FGCI - Finnish Grid and Cloud Infrastructure (predecessor of Finnish Grid Infrastructure). Through the national grid and cloud infrastructure, Triton also becomes part of the European Grid Infrastructure.

Hardware

Node name

Number of nodes

Node type

Year

Arch (--constraint)

CPU type

Memory Configuration

Infiniband

GPUs

Disks

pe[1-48,65-81]

65

Dell PowerEdge C4130

2016

hsw avx avx2

2x12 core Xeon E5 2680 v3 2.50GHz

128GB DDR4-2133

FDR

900GB HDD

pe[49-64,82]

17

Dell PowerEdge C4130

2016

hsw avx avx2

2x12 core Xeon E5 2680 v3 2.50GHz

256GB DDR4-2133

FDR

900GB HDD

pe[83-91]

8

Dell PowerEdge C4130

2017

bdw avx avx2

2x14 core Xeon E5 2680 v4 2.40GHz

128GB DDR4-2400

FDR

900GB HDD

skl[1-48]

48

Dell PowerEdge C6420

2019

skl avx avx2 avx512

2x20 core Xeon Gold 6148 2.40GHz

192GB DDR4-2667

EDR

No disk

csl[1-48]

48

Dell PowerEdge C6420

2020

csl avx avx2 avx512

2x20 core Xeon Gold 6248 2.50GHz

192GB DDR4-2667

EDR

No disk

milan[1-32]

32

Dell PowerEdge C6525

2023

milan avx avx2

2x64 core AMD EPYC 7713 @2.0 GHz

512GB DDR4-3200

HDR-100

No disk

fn3

1

Dell PowerEdge R940

2020

avx avx2 avx512

4x20 core Xeon Gold 6148 2.40GHz

2TB DDR4-2666

EDR

No disk

gpu[1-10]

10

Dell PowerEdge C4140

2020

skl avx avx2 avx512 volta

2x8 core Intel Xeon Gold 6134 @ 3.2GHz

384GB DDR4-2667

EDR

4x V100 32GB

1.5 TB SSD

gpu[11-17,38-44]

14

Dell PowerEdge XE8545

2021, 2023

milan avx avx2 ampere a100

2x24 core AMD EPYC 7413 @ 2.65GHz

503GB DDR4-3200

EDR

4x A100 80GB

440 GB SSD

gpu[20-22]

3

Dell PowerEdge C4130

2016

hsw avx avx2 kepler

2x6 core Xeon E5 2620 v3 2.50GHz

128GB DDR4-2133

EDR

4x2 GPU K80

440 GB SSD

gpu[23-27]

5

Dell PowerEdge C4130

2017

hsw avx avx2 pascal

2x12 core Xeon E5-2680 v3 @ 2.5GHz

256GB DDR4-2400

EDR

4x P100

720 GB SSD

gpu[28-37]

10

Dell PowerEdge C4140

2019

skl avx avx2 avx512 volta

2x8 core Intel Xeon Gold 6134 @ 3.2GHz

384GB DDR4-2667

EDR

4x V100 32GB

1.5 TB SSD

dgx[1-2]

2

Nvidia DGX-1

2018

bdw avx avx2 volta

2x20 core Xeon E5-2698 v4 @ 2.2GHz

512GB DDR4-2133

EDR

8x V100 16GB

7 TB SSD

dgx[3-7]

5

Nvidia DGX-1

2018

bdw avx avx2 volta

2x20 core Xeon E5-2698 v4 @ 2.2GHz

512GB DDR4-2133

EDR

8x V100 32GB

7 TB SSD

gpuamd1

1

Dell PowerEdge R7525

2021

rome avx avx2 mi100

2x8 core AMD EPYC 7262 @3.2GHz

250GB DDR4-3200

EDR

3x MI100

32GB SSD

All Triton computing nodes are identical in respect to software and access to common file system. Each node has its own unique host name and ip-address.

Networking

The cluster has two internal networks: Infiniband for MPI and Lustre filesystem and Gigabit Ethernet for everything else like NFS /home directories and ssh.

The internal networks are unaccessible from outside. Only the login node triton.aalto.fi has an extra Ethernet connection to outside.

High performance InfiniBand has fat-tree configuration in general. Triton has several InfiniBand segments (often called islands) distinguished based on the CPU arch. The nodes within those islands connected with different ratio like 2:1, 4:1 or 8:1, (i.e. in 4:1 case for each 4 downlinks there is 1 uplink to spine switches. The islands are ivb[1-45] 540 cores, pe[3-91] 2152 cores (keep in mind that pe[83-91] have 28 cores per node), four c[xxx-xxx] segments with 600 cores each, skl[1-48] and csl[1-48] with 1920 cores each [CHECKME]. Uplinks from those islands are mainly used for Lustre communication. Running MPI jobs possible on the entire island or its segment, but not across the cluster.

Disk arrays

All compute nodes and front-end are connected to DDN SFA12k storage system: large disk arrays with the Lustre filesystem on top of it cross-mounted under /scratch directory. The system provides about 1.8PB of disk space available to end-user.

Software

The cluster is running open source software infrastructure: CentOS 7, with SLURM as the scheduler and batch system.