I have two cards in my system, 1080Ti and 2080. I wonder is it possible to install pytorch and fastai with 4xx drivers and CUDA 10? As I can see, there is an instruction on how to install pytorch from sources. However, I am not sure if fastai and other things will work as expected in this setting.
Could someone advise how to better proceed in this case? Is it possible old 3xx driver for both cards and go with CUDA 9.2?
CUDA Toolkit Linux x86_64 Driver Version Windows x86_64 Driver Version
CUDA 10.0.130 >= 410.48 >= 411.31
CUDA 9.2 (9.2.148 Update 1) >= 396.37 >= 398.26
CUDA 9.2 (9.2.88) >= 396.26 >= 397.44
CUDA 9.1 (9.1.85) >= 390.46 >= 391.29
CUDA 9.0 (9.0.76) >= 384.81 >= 385.54
CUDA 8.0 (8.0.61 GA2) >= 375.26 >= 376.51
CUDA 8.0 (8.0.44) >= 367.48 >= 369.30
CUDA 7.5 (7.5.16) >= 352.31 >= 353.66
CUDA 7.0 (7.0.28) >= 346.46 >= 347.62
If you have been using Container this would be just trivial …
There is an alternative I can suggest.
Instead of install cuda from fastai environment you install it from deb verson. Then you will be free to use whatever cuda you like and use the specific version of PyTorch from the official web site.
Ok, got it! My problem is that as soon as I install 4xx driver, I can’t import torch anymore. The nvidia-smi tool shows both GPUs but torch stops working.
Yes, I am going to try your advice and install CUDA from deb. Do you think I should pick a None option from the configurator below? (When installing PyTorch after CUDA driver was installed from deb).
Well this happens because nvidia-smi that comes with fastai environment (inside cuda9.2 package) is tied to other driver version (this is not generic) then its conflict to your actual driver and nvidia-smi you have.
The only way I believe that will cost less troubles is if you comment the cuda9.2 and cudnn instalations and pytorch from the fast ai environment, then you manually install the driver 410.xx, cuda10.0, cudnn 7.xx and yet using the fastai environment you install pytoch by pip install … from the official site or you build your own for cuda 10.
Wow you are in MAC… this will not work !!! Your only option is to find soem channel in conda that has pytorch and cuda9x or cuda10 build for anacodna that you can install … I will look for it.
I believe this is just a wrapper to use your local cuda driver on your machine and have the reference inside your Conda environment, because it install a 2k file
I was able to run fp16 training. However, need more tests to see if everything works as expected. I didn’t try to install a new version of PyTorch, or compile it from sources. So it uses CUDA 9.2, I guess.
@cpbotha Yes, I was looking for something like this. I guess I’ll try this approach soon.
By the way, it seems there is a little bug with fastai.show_install:
I believe you can list this as a Bug on Fast.ai so they can fix it. But if the SMI application is ok everything will be fine for you.
Btw you may have your fastai working but be aware that does not exist cuda9.xx or lower version to ubuntu 18.04 only cuda10 is available. As professor jeremy told us, using fastai v1.0 you can do whatever you want.
Hi
I compared other day teh source code of magma 2.3.0 and 2.4.0 on the context of the patch files and I did find out that almost all patches from pytorch build project for magma is already on the new version.
The only patches that someone may needs apply if will is the cmakelists.patch and thread_queue.patch to magma 2.4.0
@cpbotha Thanks a lot for your blog post, I’ll read it in more details tomorrow.
You don’t have access to the V3 section of the forum I understand, otherwise you’d see a post on the “RTX series + Fastai”, and some of us (especially me !) are struggling to get Mixed-Precision to run properly on either 2070 or 2080Ti.
The most annoying/surprising part is that Fp16 won’t run beyond a batch-size HALF of the Fp32 (248 max vs 512) on Fastai CIFAR10 notebook (the one in GitHub repo) while it should in theory run DOUBLE. It still runs 10% faster than FP32, despite half batch-size
BTW, I posted a few runs of CIFAR10 with my 1080Ti, 2070 and 2080Ti here.
Durnit, I suspected there was a part of the forum I was not able to see.
Interesting that batch size constraint. Did sgugger also take a look? Could it be due to the divisible-by-8 fp16-constraint? (would be strange, because there are too many clever people hanging out on this forum who would have diagnosed that first)