My problem is similar to this topic, but since that issue had been fixed I decided to create another topic.
First of all, I followed all sugestions on that topic, but none helped my case, I’m running the code of lesson 1, using my personal machine, Ubuntu 16.04, CUDA Version 8.0.61, cuDNN v6.0.
Here is the thing, the first time I call ConvLearner.pretrained it takes a long loooong time (no matter if precompute is set to True or False), then if I instantiate the model again it takes normal time. The same is true for the fit function.
I took a screenshot illustrating the problem and a screenshot of the stack if I interrupt the kernel while loading the model.
Would you want share your code and see how long it take to run on my local env with GTX 1070 8GB ? just to compare whether we have the same issue.
I forgot to mention I’m running the lesson 1 code, I will update the topic. Btw, I’m also running on a GTX 1070 =)
Ah let me put some timing, I have to add in bs param else I will get runtime exception queue.Full
Let me share out the timing with you once I run mine.
Wow this is an odd problem. It’s nothing to do with the previous thread you linked to - your issue is that actually creating the model is very slow. It’s not connected to anything in the fastai lib. You can check by calling the pytorch function directly:
m = resnet34(True)
You’ll find this takes a long time, based on what you’ve shown in your post. The best place to ask about that is on https://discuss.pytorch.org/ , since it’s a pytorch issue.
Hmmm seem like updating my package on conda and pip
manage to bump up my speed ! for the first part.
I updated all packages but the problem still persists.
It seems the problem comes from calling .cuda(), there’s a issue discussing that on the pytorch forum, I will follow there, ty all =)
Could you provide a link to the pytorch forum discussion? I’d be interested to learn about it.
I’ve spent the hole day trying to fix this, while using docker like described here seemed to fix the problem, using
conda env create -f environment.yml caused the problem once again.
If you are using Docker / nvidia-docker, try using
--ipc=host in your docker run command as specified here - https://github.com/pytorch/pytorch#docker-image, otherwise Docker instance may not have access to all the memory.
@jeremy you were right, messing with docker was a waste of time. I created a new virtual env and installed each package manually when requested by some fastai import (on lesson1), the problem seem to be gone, maybe some package installed by
conda env create -f environment.yml enters in conflict with torch?
I’ve used that environment.yml for every fastai install I’ve done. So it “should” work