Describe the bug
My machine is running out of memory when I first run the ConvLearner.pretrained
from dl1/lesson1. The conda env consumes 1754MiB
gpu memory
arch=resnet34
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 2)
Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /root/.torch/models/resnet34-333f7ec4.pth
100%|██████████| 87306240/87306240 [00:04<00:00, 17734184.12it/s]
0%| | 0/360 [00:00<?, ?it/s]
RuntimeError: CUDA out of memory. Tried to allocate 24.50 MiB (GPU 0; 1.96 GiB total capacity; 1.31 GiB already allocated; 7.25 MiB free; 3.23 MiB cached)
I am using this docker image now, but the issue happened to me before when running inside a common conda env
. In fact, i switched to docker in the hopes of taming some bug in the memory handling.
https://github.com/Paperspace/fastai-docker/blob/master/Dockerfile
Expected behavior
The process should not run out of memory.
Screenshots
nvidia-smi:
Thu Dec 13 12:54:41 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 415.18 Driver Version: 415.18 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 960M Off | 00000000:01:00.0 Off | N/A |
| N/A 44C P0 N/A / N/A | 1997MiB / 2004MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1448 G /usr/lib/xorg/Xorg 231MiB |
| 0 18172 C /opt/conda/envs/fastai/bin/python 1754MiB |
+-----------------------------------------------------------------------------+
EDIT: Added more logs
Additional context
Is there a way to run the net without consuming so much memory?