Finally recognizing local GPU but cannot allocate memory

cullenaryArtist · June 2, 2023, 3:35pm

For the past week I have been jumping through flaming hoops to try and get vs code to use my GPU. I have a windows 11 OS with nvidia GE FORCE RTX 3070 16gb. I finally figured out how to use a Pytorch 2.0 container for my CUDA version. I run wsl through my windows terminal and then I open vs code and attach to the container. After creating the venv and installing everything I finally received “True” and can use my GPU.

However, when I run this part of the code:

learn = cnn_learner(dls, resnet34, metrics=accuracy)

print(next(learn.model.parameters()).device)

# Find the optimal learning rate and plot the learning rate finder
lr_min = learn.lr_find(show_plot=True)
print(f"The suggested learning rate is: {lr_min}")

I receive:

OSError                                   Traceback (most recent call last)
Cell In[15], line 9
      3 print(next(learn.model.parameters()).device)
      6 # Find the optimal learning rate
      7 # lr_min,lr_steep = learn.lr_find()
      8 # Find the optimal learning rate and plot the learning rate finder
----> 9 lr_min = learn.lr_find(show_plot=True)
     10 print(f"The suggested learning rate is: {lr_min}")

File ~/cci/models/v1/.venv/lib/python3.10/site-packages/fastai/callback/schedule.py:293, in lr_find(self, start_lr, end_lr, num_it, stop_div, show_plot, suggest_funcs)

...

 File "/root/cci/models/v1/.venv/lib/python3.10/site-packages/PIL/Image.py", line 3236, in open
    fp = builtins.open(filename, "rb")
OSError: [Errno 12] Cannot allocate memory: '/dir_in_docker/18092 REDACTED/IMG_2303.JPEG'
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

I am sooo close I can feel it. Any advice is immensely appreciated.

I cant use cloud services because of how much data I have on my external hard drive

AllenK · June 2, 2023, 9:39pm

Assuming you are using latest versions. Use ‘vision_learner’ rather than ‘cnn_learner’

matdmiller · June 3, 2023, 10:22am

If you run htop from inside of your container how much system memory does it say it has? Maybe docker is limiting the amount of RAM available to your container which is why the error is telling you it can’t allocate RAM. You may also want to increase the default amount of shared memory available when you launch your container. I don’t think this is currently causing your error, but you’ll probably run into issues at some point. Below is an example, however, you probably don’t need 24G, a smaller number, maybe 4G or less would be fine.

In general getting up and running on windows is likely going to be more challenging than linux.

Did you roughly follow this guide? Platform: Windows 10 using WSL2 w/GPU

You can try running docker stats from your terminal to see how much memory your container is using and what its limit is.