Open
Description
All of the training scripts specified in the README give errors like the following:
RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 15.78 GiB total capacity; 10.52 GiB already allocated; 3.86 GiB free; 10.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
CUDA Version: 11.7
using 8x Tesla V100-SXM2 (with 16GB memory)
reducing --batch_size 32
didn't help
passing --microbatch 1
didn't help
Metadata
Metadata
Assignees
Labels
No labels