Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

CUDA out of memory when training is done on multiple GPU

My nvidia-smi output is as follows:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti      Off| 00000000:02:00.0 Off |                  N/A |
| 20%   54C    P2               83W / 250W|   4692MiB / 11264MiB |     45%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce GTX 1080 Ti      Off| 00000000:03:00.0 Off |                  N/A |
| 26%   60C    P2               73W / 250W|   4650MiB / 11264MiB |     44%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce GTX 1080 Ti      Off| 00000000:81:00.0 Off |                  N/A |
| 50%   71C    P0               84W / 250W|      0MiB / 11264MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce GTX 1080 Ti      Off| 00000000:82:00.0 Off |                  N/A |
| 30%   53C    P0               75W / 250W|      0MiB / 11264MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   3494144      C   python                                     4690MiB |
|    1   N/A  N/A   3494896      C   python                                     4648MiB |
+---------------------------------------------------------------------------------------+

I’m running a script to train from scratch a RoBERTa model (based on this article and this notebook), but when I run CUDA_VISIBLE_DEVICES=2,3 python script.py (this is a machine where other researchers run their scripts; kill the processes on GPU 0 and 1 is not an option), I have the following error:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.07 GiB (GPU 0; 10.91 GiB total capacity; 8.36 GiB already allocated; 1.93 GiB free; 8.40 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Why is only one GPU’s RAM being recognized (as seen at 10.91 GiB total capacity)? By selecting more than one GPU would I not be able to use the total space made available by them? I would like to use this space as it would allow me to have a larger batch size value for training. Because of some time restrictions, I don’t intend to use a lower train batch size.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

The batch size that you set in torch will be the batch size used by each single GPU. Multi-GPU training allows you to distribute each batch to a different GPU to speed up each epoch, the weights learned by each GPU are then integrated into the resulting model.
So you can’t use a bigger batch size just because the training employs more GPUs.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading