Pytorch + Graphics Cards + Systems + Different CNN benchmarks

Justin Johnson from stanford has some good hardware benchmarks for how fast different CNN’s run.

Sample Table from his page:

4 Likes

Happy new year everybody!

I am switching from Keras to Pytorch. Apparently i installed everything fine but while trying to run the first lesson’s block:

arch=resnet34
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 3)

i get following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-12-42d5f498bf97> in <module>()
      1 arch=resnet34
      2 data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
----> 3 learn = ConvLearner.pretrained(arch, data, precompute=True)
      4 learn.fit(0.01, 3)

~/fastai/courses/dl1/fastai/conv_learner.py in pretrained(cls, f, data, ps, xtra_fc, xtra_cut, **kwargs)
     96     def pretrained(cls, f, data, ps=None, xtra_fc=None, xtra_cut=0, **kwargs):
     97         models = ConvnetBuilder(f, data.c, data.is_multi, data.is_reg, ps=ps, xtra_fc=xtra_fc, xtra_cut=xtra_cut)
---> 98         return cls(data, models, **kwargs)
     99 
    100     @property

~/fastai/courses/dl1/fastai/conv_learner.py in __init__(self, data, models, precompute, **kwargs)
     89         elif self.metrics is None:
     90             self.metrics = [accuracy_multi] if self.data.is_multi else [accuracy]
---> 91         if precompute: self.save_fc1()
     92         self.freeze()
     93         self.precompute = precompute

~/fastai/courses/dl1/fastai/conv_learner.py in save_fc1(self)
    135         m=self.models.top_model
    136         if len(self.activations[0])!=len(self.data.trn_ds):
--> 137             predict_to_bcolz(m, self.data.fix_dl, act)
    138         if len(self.activations[1])!=len(self.data.val_ds):
    139             predict_to_bcolz(m, self.data.val_dl, val_act)

~/fastai/courses/dl1/fastai/model.py in predict_to_bcolz(m, gen, arr, workers)
     12     m.eval()
     13     for x,*_ in tqdm(gen):
---> 14         y = to_np(m(VV(x)).data)
     15         with lock:
     16             arr.append(y)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    323         for hook in self._forward_pre_hooks.values():
    324             hook(self, input)
--> 325         result = self.forward(*input, **kwargs)
    326         for hook in self._forward_hooks.values():
    327             hook_result = hook(self, input, result)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
     65     def forward(self, input):
     66         for module in self._modules.values():
---> 67             input = module(input)
     68         return input
     69 

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    323         for hook in self._forward_pre_hooks.values():
    324             hook(self, input)
--> 325         result = self.forward(*input, **kwargs)
    326         for hook in self._forward_hooks.values():
    327             hook_result = hook(self, input, result)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/conv.py in forward(self, input)
    275     def forward(self, input):
    276         return F.conv2d(input, self.weight, self.bias, self.stride,
--> 277                         self.padding, self.dilation, self.groups)
    278 
    279 

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/functional.py in conv2d(input, weight, bias, stride, padding, dilation, groups)
     88                 _pair(0), groups, torch.backends.cudnn.benchmark,
     89                 torch.backends.cudnn.deterministic, torch.backends.cudnn.enabled)
---> 90     return f(input, weight, bias)
     91 
     92 

RuntimeError: CUDNN_STATUS_NOT_INITIALIZED

I checked nvidia drivers version with nvcc --version it seems ok

nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

Like .theanorc is there any Pytorch Configuration file that needs to be adjusted with the GPU ?

Thanks for any help!

Sounds like you might not have cudnn installed, or may have the wrong version. Try using the fastai AMI or Paperspace if you want to get up and running quickly.

Hi: somehow with my new paperspace machine and setup using your new script for v2 (curl http://files.fast.ai/setup/paperspace | bash),
I don’t seem to get pytorch to recognize cuda:

Simple test script:

import torch
import torch.utils.data
from torch import nn, optim
from torch.autograd import Variable
from torch.nn import functional as F
from torchvision import datasets, transforms
from torchvision.utils import save_image

print(torch.cuda.is_available())
exit();

Prints false.

I tried this because the lesson1 resnet portion was training very slow (60s/it).
I wonder how to debug this.

I resolved the problem of the GPU not found. I unstalled pytorch, torchvision (conda uninstall) and then reinstalled the same in the fastai environment. GPU is found and the training is quite fast.

That’s odd. Did you run the whole script on a fresh machine? It should have installed it into the fastai env for you…

Hi Jeremy, yes I ran the whole script on a new machine and it did install pytorch etc into the fastai env. But the installed versions didn’t use the GPU and standalone tests failed with errors when torch tried to use cuda as I said in my earlier message. I then created a new env with just torch and that passed the GPU tests. After that I simply uninstalled and reinstalled torch and torchvision into the fastai env and now things are good.

Thanks for the tip.

I reinstalled/upgraded CUDA & CUDNN and it fixed the error.

I have the same issue that pytorch cannot recognize cuda/GPU with running fastai script in a fresh paperspace machine. I found CUDA installed cannot recognize GPU by testing CUDA samples, and I reinstall pytorch and CUDA but they all still didn’t work. Then I create another fresh machine and follow these common steps: install cuda, cuDNN —> reboot --> verify if cuda works with GPU by running samples of CUDA —> install Anaconda —> create a new python environment —> install pytorch with conda command —> verify if torch.cuda.is_available() —> pip install fastai and checkout fastai from github. At last, the codes in notebooks can run in GPU, I don’t know why. I just copied lines from Jeremy’s scripts in a different sequence.

Did you checked versions? mine CUDA version just the second last and it somehow created problem.

both should be the latest. Checkout Nvidia’s website to find out latest versions. Also (as in the paperspace script) the enviroment variables have to be set.

I checked the versions of CUDA. The version of CUDA in script is 9.0, and wget cuda 9.0 command will download a very small size file which is unreasonable, but it will still download and install the latest version (9.1) automatically when executing the command of installing CUDA.

Thats weird. CUDA is over 1 GB i think. To download CUDNN directly from nvidia site you need to log in i think. The script from Jeramy goes directly to URL where you dont need to login.

Check your setup files if they valid setup files.

Are you uninstalling everything before reinstalling?

I am sorry I make it confusing. I download CUDA from Nvidia site and cudnn file from fastai site, same lines in the script

I try to uninstall CUDA with apt command after finding your tips, but I cannot make sure if my way of uninstalling is correct and complete.

Here is mine nvcc --version output
Casdasdasdapture:

to check your version in Jupyter notebook you can use following lines. mine outputs 7005

import torch
torch.backends.cudnn.version()

and

torch.backends.cudnn.version()
print(torch.backends.cudnn.is_acceptable(torch.cuda.FloatTensor(1)))
print(torch.backends.cudnn.version())

returns

True
7005

which means it recognizes CUDNN.

Try uninstalling everything and then do a clean install.

1 Like

Thanks so much for your advice. I will run your codes to verify it again. Actually, I have made torch recognize CuDNN and run the model in GPU following the steps I posted.

It returns

True
7003

You want to upgrade it to 7005?