Cluster overview¶
Hardware¶
Different types of nodes:
- 144 compute nodes HP SL390s G7, each equipped with 2x Intel Xeon X5650 2.67GHz (Westmere six-core each). 118 compute nodes wsm[1-112,137-144], have 48 GB of DDR3-1066 memory, others wsm[113-136] have 96GB, each node has 4xQDR Infiniband port, wsm[1-112,137-144] have about 830 GB of local disk space (2 striped 7.2k SATA drives), while wsm[113-136] about 380GB on single drive. 16 nodes have by two additional SATA drives.
- 3 compute nodes
gpu[20-22]
are Dell PowerEdge C4130 for gpu computing. CPUs are 2x6 core Xeon E5 2620 v3 2.50GHz and memory configuration is 128GB DDR4-2133. There are 4x2 GPU K80 cards per node. - 2 fat nodes HP DL580 G7 4U, 4x Xeon, 6x SATA drives, 1TB of DDR3-1066 memory each and 4xQDR Infiniband port.
- 48 compute nodes
ivb[1-48]
are HP SL230s G8 with 2x Xeon E5 2680 v2 10-core CPUs. First 24 nodes have 256 GB of DDR3-1667 memory and the other 24 are equipped with 64 GB. - 67 compute nodes Dell PowerEdge C4130 servers with 2x Xeon E5 2680 v3 2.5GHz 12-core CPUs. 51 nodes
pe[1-48,65-67]
have 128 GB of DDR4-2133 memory while 15 nodespe[49-64]
have 256GB. - 9 compute nodes
pe[83-91]
Dell PowerEdge C4130 with 2x Xeon X5 E5-2680 v4 2.4 GHz 14-core CPUs. - 5 compute nodes
gpu[23-27]
for gpu computing. CPUs are 2x12core Xeon E5-2680 v3 @ 2.50GHz and memory configuration 256GB DDR4-2400. There are 4x Tesla P100 16GB cards per node. - 2 Nvidia DGX-1 compute nodes for gpu computing. CPUs are 2x20 core Xeon E5-2698 v4 @ 2.2GHz and memory configuration 512GB DDR4-2133. There are 8x Tesla V100 16GB cards per node. These nodes have a special operating system, unlike the rest of the cluster, and thus these nodes require special considerations. See the DGX page.
Node name | Number of nodes | Node type | Year | Arch (constraint) | CPU type | Memory Configuration | GPUs |
---|---|---|---|---|---|---|---|
wsm[1-112,137-144] | 120 | ProLiant SL390s G7 | wsm | 2x6 core Intel Xeon X5650 2.67GHz | 48GB DD3-1333 | ||
wsm[113-136] | 24 | ProLiant SL390s G7 | wsm | 2x6 core Intel Xeon X5650 2.67GHz | 96GB DD3-1333 | ||
ivb[1-24] | 24 | ProLiant SL230s G8 | ivb,avx | 2x10 core Xeon E5 2680 v2 2.80GHz | 256GB DDR3-1667 | ||
ivb[25-48] | 24 | ProLiant SL230s G8 | ivb,avx | 2x10 core Xeon E5 2680 v2 2.80GHz | 64GB DDR3-1667 | ||
pe[1-48,65-81] | 65 | Dell PowerEdge C4130 | 2016 | hsw,avx,avx2 | 2x12 core Xeon E5 2680 v3 2.50GHz | 128GB DDR4-2133 | |
pe[49-64,82] | 17 | Dell PowerEdge C4130 | 2016 | hsw,avx,avx2 | 2x12 core Xeon E5 2680 v3 2.50GHz | 256GB DDR4-2133 | |
pe[83-91] | 8 | Dell PowerEdge C4130 | bdw,avx,avx2 | 2x14 core Xeon E5 2680 v4 2.40GHz | 128GB DDR4-2400 | ||
skl[1-48] | 48 | Dell PowerEdge C6420 | 2019 | skl,avx,avx2,avx512 | 2x20 core Xeon Gold 6148 2.40GHz | 192GB DDR4-2667 | |
gpu[20-22] | 3 | Dell PowerEdge C4130 | 2016 | hsw,avx,avx2,kepler | 2x6 core Xeon E5 2620 v3 2.50GHz | 128GB DDR4-2133 | 4x2 GPU K80 |
gpu[23-27] | 5 | Dell PowerEdge C4130 | 2017 | hsw,avx,avx2,pascal | 2x12 core Xeon E5-2680 v3 @ 2.5GHz | 256GB DDR4-2400 | 4x P100 |
dgx[01-02] | 2 | Nvidia DGX-1 | 2018 | bdw,avx,avx2,volta | 2x20 core Xeon E5-2698 v4 @ 2.2GHz | 512GB DDR4-2133 | 8x V100 |
gpu[28-37] | 10 | Dell PowerEdge C4140 | 2019 | skl,avx,avx2,avx512,volta | 2x8 core Intel Xeon Gold 6134 @ 3.2GHz | 384GB DDR4-2667 | 4x V100 32GB |
All Triton computing nodes are identical in respect to software and access to common file system. Each node has its own unique host name and ip-address.
Networking¶
The cluster has two internal networks: Infiniband for MPI and Lustre
filesystem and Gigabit Ethernet for everything else like NFS /home
directories and ssh.
The internal networks are unaccessible from outside. Only the login node
triton.aalto.fi
has an extra Ethernet connection to outside.
High performance InfiniBand has fat-tree configuration in general. Triton
has several InfiniBand segments (often called islands) distinguished based
on the CPU arch. The nodes within those islands connected with ratio 2:1,
thus for each 2
downlinks there is 1 uplink to spine switches. The islands are
wsm[1-144]
1728 cores, ivb[1-45]
540 cores, pe[1-91]
2192 cores
(keep in mind that pe[83-91]
have 28 cores per node). Uplinks from
those islands are mainly used for Lustre communication.
Running MPI jobs possible on the entire island or its segment, but not
across the cluster.
See the IB topology map at cluster technical details page.
Disk arrays¶
All compute nodes and front-end are connected to DDN SFA12k storage
system:
large disk arrays with the Lustre filesystem on top of it cross-mounted
under /scratch
directory. The system provides about 1.8PB of disk
space available to end-user.
Software¶
The cluster is running open source software infrastructure: CentOS 7, with SLURM as the scheduler and batch system.