Humboldt-Universität zu Berlin - Faculty of Mathematics and Natural Sciences - IT Service Group (RBG)

GPU-Server

Presently a few servers equipped with Nvidia GPUs (Tesla V100, RTX6000) are available. They have CUDA, OpenCL and TensorFlow (Python) installed.

The Pool Computers belonging to the pools "Berlin" and "Brandenburg" also possess a Nvidia GPU and are available to use with CUDA, OpenCL and Tensorflow. All Pool Computers in the remaining pool rooms utilize an Intel-GPU (onboard). They are available to use with OpenCL for computations.

The servers are also accessible through SSH or RDP (gruenau[9-10]). Please open a VPN-Connection when working from outside the HU-network.

 

Overview of the GPU-Servers and Computers

The following table details the currently available servers and computers with a GPU including additional information on the GPU type.

 

Server/PCs GPU CUDA Miscellaneous Slurm gres
gruenau1

2x Nvidia Tesla V100

1x Nvidia RTX6000

Y (11.8) OpenCL,TensorFlow  
gruenau2 3x Nvidia RTX6000 Y (11.8) OpenCL,TensorFlow RTX6000
gruenau7 4 x Nvidia RTX A6000 Y (11.8) OpenCL,TensorFlow  
gruenau8 4 x Nvidia RTX A6000 Y (11.8) OpenCL,TensorFlow  
gruenau9 3 x Nvidia Tesla A100 Y (11.8) OpenCL,TensorFlow A10080GB
gruenau10 3 x Nvidia Tesla A100 Y (11.8) OpenCL,TensorFlow A10080GB
PC-Pool (Berlin/Brandenburg) 1x GeForce GTX 745 Y (11.8) OpenCL, TensorFlow GTX745
restliche PC-Pools 1x Intel Skylake GT2 N OpenCL  

Current usage values of the GPUs.

 

Further information about the GPUs

Following more detailed specifications on the available GPUs:

 

GPU RAM (GB) RAM Bandwidth (GB/s) GPU Speed (MHz) CUDA cores Tensor cores Raytracing cores Compute Cap
GeForce GTX 745 4GB 28.8 1033 384 / / 5.0
Nvidia Tesla V100 32GB 897.0 1530 5120 640 / 7.0
Nvidia Tesla T4 16GB 320.0 1515 2560 320 40 7.5
Nvidia RTX6000 24GB 672.0 1770 4608 576 72 7.5
GeForce RTX 3090 24GB 936.2 1695 10496 328 82 8.6
Nvidia RTX A6000 48GB 768 2100 10752 336 84 8.6
Nvidia Tesla A100 80GB 1600 1410 6912 432 / 8.0

Please also use the tools "clinfo" and "nvidia-smi" to obtain additional information to help you choose the best fit for your project.

 

Selection Guide

Depending on the workload, it may be rewarding to prefer one system over the other. The below tables provide an overview of the throughput of the different GPU types based on the input.

 

Comparison of GPU High-End Systems:

GPU FP16 (TFLOPS)

FP32 (TFLOPs)

FP64 (TFLOPS) Deep Learning (TOPs) Ray Tracing (TFLOPS)
Nvidia Tesla V100 30.0 15.0 7.5 120 /
Nvidia Tesla T4 16.2 8.1 0.25 65 /
Nvidia RTX6000 32.6 16.3 0.5

130

34
GeForce RTX 3090 35.58 35.58 1.11 142 / 284* 58
Nvidia RTX A6000 38,7
 38,7 1.21 309,7
75,6
Nvidia Tesla A100 77,97
19,49
9.746 ?
/

The recommendations for certain scenarios are highlighted in yellow.

Legende:
TFLOPs = Tera Floating Point Operations per Second
TOPs = Tera Operations per Second
INTX = Integer variable with X-bits
FPX = Floating point variable with X-bits
GRays = Giga Rays per second
* = Doppelte Performance, wenn Sparsity-Feature genutzt wird

 

Comparison of Servers:

 

Server

Geekbench5
CPU (Single)
Geekbench5
CPU (Multi)
GPUs Recommended Scenario
gruenau1 1078 25239 (36/72 Cores) 2 x RTX6000 Multi GPU
Ray Tracing
Deep Learning
max. CPU
gruenau2 1078 25239 (36/72 Cores) 2 x RTX6000 Multi GPU
Ray Tracing
Deep Learning
max. CPU
gruenau9 854 14169 (16/32 Cores) 3 x T4 FP64 Computation
max. RAM
gruenau10 1078 25239 (36/72 Cores) 2 x V100

FP64 Computation
max. CPU
max. RAM

PC-Pool (Berlin/Brandenburg) 1109 4308 (4C/8T) GeForce GTX 745 /
gruenau[5-8] 695 27451 (60C/120T) / /

For additional information on the specifications of the Compute-Servers, please klick on the respective servername in the table.

 

General Remark:

Since all resources are shared among the currently active users, it may be beneficial, after careful consideration, to choose the second or third best option on the list of resources for the envisaged project (assuming that the second or third best in this example has a considerably lower usage).

For better load balancing it is recommended to use SLURM.

 

 

Links

[1] https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9234-volta-and-turing-architecture-and-performance-optimization.pdf

[2] https://blogs.nvidia.com/blog/2019/11/15/whats-the-difference-between-single-double-multi-and-mixed-precision-computing/

[3] https://docs.nvidia.com/cuda/ampere-tuning-guide/index.html

[4] https://en.wikipedia.org/wiki/Nvidia_Tesla