GPU compute server rosenblatt

The GPU compute server rosenblatt has been available since February 2022 for computations that benefit from graphics cards.

It was jointly financed by the professorships of Stochastics, Harmonic Analysis, Numerical Mathematics und Scientific Computing (one GPU each) as well as by faculty funds (base machine).

About the name rosenblatt

Frank Rosenblatt was a pioneer of neural networks:

Hardware

CPU: 2 pieces AMD EPYC 7313 16-Core Processor
256 GB RAM
250 GB SSD system and swap partition
3.5 TB SSD data partition
4 pieces GPU NVIDIA A40 amp architecture CUDA® processing units
Ubuntu 22.04

CUDA installation and usage

Under /usr/local/cuda-12.2 there is an installation of the CUDA environment. To use it, a few environment variables need to be set. This is most easily done with the prepared script by calling:

source /usr/local/cuda-12.2/cudaenv.sh

Subsequently, when querying the version of the nvcc compiler, you get the correct output as 12.2:

rosenblatt:~$ source /usr/local/cuda-12.2/cudaenv.sh
rosenblatt:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0

and can start compiling your own programs.

The tool nvidia-smi is available for monitoring the GPU load. A simple call returns, for example:

rosenblatt:~$ nvidia-smi
Fri Jul  7 10:12:51 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A40                     On  | 00000000:01:00.0 Off |                    0 |
|  0%   34C    P0              80W / 300W |    274MiB / 46068MiB |    100%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A40                     On  | 00000000:61:00.0 Off |                    0 |
|  0%   23C    P8              12W / 300W |      7MiB / 46068MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A40                     On  | 00000000:A1:00.0 Off |                    0 |
|  0%   23C    P8              12W / 300W |      7MiB / 46068MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A40                     On  | 00000000:C1:00.0 Off |                    0 |
|  0%   23C    P8              12W / 300W |      7MiB / 46068MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A    146064      C   ./a.out                                     262MiB |
+---------------------------------------------------------------------------------------+

A repetitive output is more suitable for continuous monitoring. This can be done by means of

nvidia-smi -i 0 -lms 300 --format=csv --query-gpu=power.draw,utilization.gpu,fan.speed,temperature.gpu,memory.used

can be achieved. The parameters

-i 0 ... show me GPU0, the other 3 GPUs analog with -i 1, -i 2, -i 3
-lms 300 ... output of all 300 milliseconds

So you get as continuous display

rosenblatt:~$ nvidia-smi -i 0 -lms 300 --format=csv --query-gpu=power.draw,utilization.gpu,fan.speed,temperature.gpu,memory.used

power.draw [W], utilization.gpu [%], fan.speed [%], temperature.gpu, memory.used [MiB]
74.75 W, 0 %, 0 %, 36, 0 MiB
74.69 W, 0 %, 0 %, 36, 0 MiB
74.72 W, 0 %, 0 %, 36, 0 MiB
74.78 W, 0 %, 0 %, 36, 0 MiB
74.71 W, 0 %, 0 %, 36, 0 MiB
74.71 W, 0 %, 0 %, 36, 0 MiB
74.76 W, 0 %, 0 %, 36, 0 MiB
74.73 W, 0 %, 0 %, 36, 0 MiB
74.73 W, 0 %, 0 %, 36, 0 MiB
74.78 W, 0 %, 0 %, 36, 0 MiB
74.78 W, 0 %, 0 %, 36, 0 MiB
74.77 W, 0 %, 0 %, 36, 2 MiB
90.46 W, 5 %, 0 %, 40, 455 MiB
137.43 W, 100 %, 0 %, 43, 455 MiB
184.72 W, 100 %, 0 %, 44, 455 MiB
232.18 W, 100 %, 0 %, 45, 455 MiB
249.01 W, 100 %, 0 %, 46, 455 MiB
250.37 W, 100 %, 0 %, 46, 455 MiB
251.33 W, 100 %, 0 %, 47, 455 MiB
252.26 W, 100 %, 0 %, 47, 455 MiB
252.45 W, 100 %, 0 %, 47, 455 MiB
...

and you can see very nicely how power consumption, GPU, temperature and RAM go up when the program starts.