Cluster technical overview
Hardware
Node name |
Number of nodes |
Node type |
Year |
Arch ( |
CPU type |
Memory Configuration |
Infiniband |
GPUs |
Disks |
---|---|---|---|---|---|---|---|---|---|
pe[1-48,65-81] |
65 |
Dell PowerEdge C4130 |
2016 |
hsw avx2 |
2x12 core Xeon E5 2680 v3 2.50GHz |
128GB DDR4-2133 |
FDR |
900GB HDD |
|
pe[49-64,82] |
17 |
Dell PowerEdge C4130 |
2016 |
hsw avx2 |
2x12 core Xeon E5 2680 v3 2.50GHz |
256GB DDR4-2133 |
FDR |
900GB HDD |
|
pe[83-91] |
8 |
Dell PowerEdge C4130 |
2017 |
bdw avx2 |
2x14 core Xeon E5 2680 v4 2.40GHz |
128GB DDR4-2400 |
FDR |
900GB HDD |
|
skl[1-48] |
48 |
Dell PowerEdge C6420 |
2019 |
skl avx2 avx512 |
2x20 core Xeon Gold 6148 2.40GHz |
192GB DDR4-2667 |
EDR |
No disk |
|
csl[1-48] |
48 |
Dell PowerEdge C6420 |
2020 |
csl avx2 avx512 |
2x20 core Xeon Gold 6248 2.50GHz |
192GB DDR4-2667 |
EDR |
No disk |
|
milan[1-32] |
32 |
Dell PowerEdge C6525 |
2023 |
milan avx2 |
2x64 core AMD EPYC 7713 @2.0 GHz |
512GB DDR4-3200 |
HDR-100 |
No disk |
|
fn3 |
1 |
Dell PowerEdge R940 |
2020 |
avx2 avx512 |
4x20 core Xeon Gold 6148 2.40GHz |
2TB DDR4-2666 |
EDR |
No disk |
|
gpu[1-10] |
10 |
Dell PowerEdge C4140 |
2020 |
skl avx2 avx512 volta |
2x8 core Intel Xeon Gold 6134 @ 3.2GHz |
384GB DDR4-2667 |
EDR |
4x V100 32GB |
1.5 TB SSD |
gpu[11-17,38-44] |
14 |
Dell PowerEdge XE8545 |
2021, 2023 |
milan avx2 ampere a100 |
2x24 core AMD EPYC 7413 @ 2.65GHz |
503GB DDR4-3200 |
EDR |
4x A100 80GB |
440 GB SSD |
gpu[20-22] |
3 |
Dell PowerEdge C4130 |
2016 |
hsw avx2 kepler |
2x6 core Xeon E5 2620 v3 2.50GHz |
128GB DDR4-2133 |
EDR |
4x2 GPU K80 |
440 GB SSD |
gpu[23-27] |
5 |
Dell PowerEdge C4130 |
2017 |
hsw avx2 pascal |
2x12 core Xeon E5-2680 v3 @ 2.5GHz |
256GB DDR4-2400 |
EDR |
4x P100 |
720 GB SSD |
gpu[28-37] |
10 |
Dell PowerEdge C4140 |
2019 |
skl avx2 avx512 volta |
2x8 core Intel Xeon Gold 6134 @ 3.2GHz |
384GB DDR4-2667 |
EDR |
4x V100 32GB |
1.5 TB SSD |
dgx[1-2] |
2 |
Nvidia DGX-1 |
2018 |
bdw avx2 volta |
2x20 core Xeon E5-2698 v4 @ 2.2GHz |
512GB DDR4-2133 |
EDR |
8x V100 16GB |
7 TB SSD |
dgx[3-7] |
5 |
Nvidia DGX-1 |
2018 |
bdw avx2 volta |
2x20 core Xeon E5-2698 v4 @ 2.2GHz |
512GB DDR4-2133 |
EDR |
8x V100 32GB |
7 TB SSD |
gpuamd1 |
1 |
Dell PowerEdge R7525 |
2021 |
rome avx2 mi100 |
2x8 core AMD EPYC 7262 @3.2GHz |
250GB DDR4-3200 |
EDR |
3x MI100 |
32GB SSD |
gpu[45-48] |
4 |
Dell PowerEdge XE8640 |
2024 |
saphr avx2 h100 hopper |
2x48 core Xeon Platinum 8468 2.1GHz |
1024GB DDR5-4800 |
HDR |
4x H100 SXM 80GB |
21 TB SSD |
All Triton computing nodes are identical in respect to software and access to common file system. Each node has its own unique host name and ip-address.
Networking
The cluster has two internal networks: Infiniband for MPI and Lustre
filesystem and Gigabit Ethernet for everything else like NFS /home
directories and ssh.
The internal networks are unaccessible from outside. Only the login node
triton.aalto.fi
has an extra Ethernet connection to outside.
High performance InfiniBand has fat-tree configuration in general. Triton
has several InfiniBand segments (often called islands) distinguished based
on the CPU arch. The nodes within those islands connected with different
ratio like 2:1, 4:1 or 8:1, (i.e. in 4:1 case for each 4
downlinks there is 1 uplink to spine switches. The islands are
ivb[1-45]
540 cores, pe[3-91]
2152 cores
(keep in mind that pe[83-91]
have 28 cores per node), four c[xxx-xxx]
segments
with 600 cores each, skl[1-48] and csl[1-48] with 1920 cores each [CHECKME]. Uplinks from
those islands are mainly used for Lustre communication.
Running MPI jobs possible on the entire island or its segment, but not
across the cluster.
Disk arrays
All compute nodes and front-end are connected to DDN SFA12k storage
system:
large disk arrays with the Lustre filesystem on top of it cross-mounted
under /scratch
directory. The system provides about 1.8PB of disk
space available to end-user.
Software
The cluster is running open source software infrastructure: CentOS 7, with SLURM as the scheduler and batch system.