Very slow loading of ConvLearner

lgvaz · November 8, 2017, 2:32am

My problem is similar to this topic, but since that issue had been fixed I decided to create another topic.

First of all, I followed all sugestions on that topic, but none helped my case, I’m running the code of lesson 1, using my personal machine, Ubuntu 16.04, CUDA Version 8.0.61, cuDNN v6.0.

Here is the thing, the first time I call ConvLearner.pretrained it takes a long loooong time (no matter if precompute is set to True or False), then if I instantiate the model again it takes normal time. The same is true for the fit function.
I took a screenshot illustrating the problem and a screenshot of the stack if I interrupt the kernel while loading the model.

naruto79 · November 8, 2017, 2:41am

Would you want share your code and see how long it take to run on my local env with GTX 1070 8GB ? just to compare whether we have the same issue.

lgvaz · November 8, 2017, 2:44am

I forgot to mention I’m running the lesson 1 code, I will update the topic. Btw, I’m also running on a GTX 1070 =)

naruto79 · November 8, 2017, 2:53am

Ah let me put some timing, I have to add in bs param else I will get runtime exception queue.Full
Let me share out the timing with you once I run mine.

jeremy · November 8, 2017, 5:20am

Wow this is an odd problem. It’s nothing to do with the previous thread you linked to - your issue is that actually creating the model is very slow. It’s not connected to anything in the fastai lib. You can check by calling the pytorch function directly:

m = resnet34(True)

You’ll find this takes a long time, based on what you’ve shown in your post. The best place to ask about that is on https://discuss.pytorch.org/ , since it’s a pytorch issue.

naruto79 · November 8, 2017, 6:34am

Hmmm seem like updating my package on conda and pip

manage to bump up my speed ! for the first part.

lgvaz · November 8, 2017, 2:01pm

I updated all packages but the problem still persists.

It seems the problem comes from calling .cuda(), there’s a issue discussing that on the pytorch forum, I will follow there, ty all =)

jeremy · November 8, 2017, 2:03pm

Could you provide a link to the pytorch forum discussion? I’d be interested to learn about it.

lgvaz · November 8, 2017, 2:04pm

@jeremy, Here

lgvaz · November 9, 2017, 12:32am

I’ve spent the hole day trying to fix this, while using docker like described here seemed to fix the problem, using conda env create -f environment.yml caused the problem once again.

ramesh · November 9, 2017, 12:35am

If you are using Docker / nvidia-docker, try using --ipc=host in your docker run command as specified here - https://github.com/pytorch/pytorch#docker-image, otherwise Docker instance may not have access to all the memory.

lgvaz · November 9, 2017, 1:58am

@jeremy you were right, messing with docker was a waste of time. I created a new virtual env and installed each package manually when requested by some fastai import (on lesson1), the problem seem to be gone, maybe some package installed by conda env create -f environment.yml enters in conflict with torch?

jeremy · November 9, 2017, 2:10am

I’ve used that environment.yml for every fastai install I’ve done. So it “should” work