Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

GPU memory nearly full after defining tf.distribute.MirroredStrategy?

I am coming across a strange issue when using TensorFlow (2.9.1). After defining a distributed training strategy, my GPU memory appears to fill.

Steps to reproduce are simple:

import tensorflow as tf
strat = tf.distribute.MirroredStrategy()

After the first line (importing TensorFlow), nvidia-smi outputs:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Fri Jun 10 03:01:47 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P6000        Off  | 00000000:04:00.0 Off |                  Off |
| 26%   25C    P8     9W / 250W |      0MiB / 24449MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Quadro P6000        Off  | 00000000:06:00.0 Off |                  Off |
| 26%   20C    P8     7W / 250W |      0MiB / 24449MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

After the second line of code, nvidia-smi outputs:

Fri Jun 10 03:02:43 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P6000        Off  | 00000000:04:00.0 Off |                  Off |
| 26%   29C    P0    59W / 250W |  23951MiB / 24449MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Quadro P6000        Off  | 00000000:06:00.0 Off |                  Off |
| 26%   25C    P0    58W / 250W |  23951MiB / 24449MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   1833720      C   python                          23949MiB |
|    1   N/A  N/A   1833720      C   python                          23949MiB |
+-----------------------------------------------------------------------------+

The GPU memory is almost entirely full? There is also some terminal output:

2022-06-10 03:02:37.442336: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-10 03:02:39.136390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 23678 MB memory:  -> device: 0, name: Quadro P6000, pci bus id: 0000:04:00.0, compute capability: 6.1
2022-06-10 03:02:39.139204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 23678 MB memory:  -> device: 1, name: Quadro P6000, pci bus id: 0000:06:00.0, compute capability: 6.1
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')

Any ideas on why this is occurring would be helpful! Additional details about my configuration:

  • Python 3.10.4 [GCC 7.5.0] on linux
  • tensorflow 2.9.1
  • cuda/11.2.2 cudnn/v8.2.1

>Solution :

By default, Tensorflow will map almost all of your GPU memory: official guide. This is for performance reasons: by allocating the GPU memory, it reduces latency that memory growth would typically cause.

You can try using tf.config.experimental.set_memory_growth to prevent it from immediately filling up all its memory. There are also some good explanations on this StackOverflow post.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading