Fastai large model support (remove GPU memory limitations)

problem: I need to install fastai from this commit when it was using pytorch 1.3, please provide a how to for non experts?

motivation: machine learning on laptops with barely enough VRAM memory to work with tutorial models is no fun
IBM has “pytorch large model support” that would be great if it worked.



(current: Watson machine learning conda channel (pytorch 1.3)

note pytorch 1.5 with large model support is in the early access channel

edit APR28: turns out all is needed is to clone the repository and git checkout the needed branch, then I run the install command to get a dev install per https://github.com/fastai/fastai ; however pip is still trying to install torch 1.8.1, no idea why because I have already conda installed pytorch 1.3.1

$ git clone  https://github.com/fastai/fastai.git
$ cd fastai
$ git checkout 11603b239c2bd03c0d28c4e883e59a35e94bb86f
$ cd ..
$ pip install -e "fastai[dev]"
  • any ideas on how to not-install torch 1.8.1?

https://imgur.com/7eLti4p.png see image:
Collecting torch>=1.3.0
Downloading torch-1.8.1-cp37-cp37m-manylinux1_x86_64.whl (804.1 MB)
| | 1.1 MB 1.6 MB/s eta 0:08:34^C
ERROR: Operation cancelled by user
(pytorch1.3lms) b@eiece:~/gitz$ conda list | grep torch

packages in environment at /home/b/miniconda3/envs/pytorch1.3lms:
_pytorch_select 2.0 gpu_21941.g1a3a219 https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda
pytorch 1.3.1 21941.g1a3a219 https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda
pytorch-base 1.3.1 gpu_py37_21941.g1a3a219 https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda

Fastai’s requirement is torch >= 1.3.0, and 1.8.1 >= 1.3.0, hence 1.8.1 is installed. You could either let the latest version be installed, then manually reinstall 1.3.0, or edit fastai/environment.yml & fastai/settings.ini and remove the parts dealing with PyTorch so it wouldn’t be installed/upgraded.

Have a great weekend!

Thanks Bob: changing the environment.yml and settings.ini seems to be the best way.
final edit: Actually now I’ve noticed that !pip install -Uqq fastbook has snuck in a copy of torch 1.7.1 again and not from the IBM repository where it is not available, ugh. I’m going to go back to training on CPU as the newer AMD ZEN 2 or ZEN 3 cpus are beasts and good enough for scholastic training, at least compared to laptop NVIDIA 1660ti max-q or lower parts.

full procedure:
I’m running “checkout master” on both fastai and fastbook
-with the exception of edits to the two files as above
fastai installed with NOT using dev (see first post)

pip install -e "fastai"

fastbook installed with the following command to avoid the graphviz missing error

conda install -c fastai fastbook

fastbook must AGAIN be installed with the following command or there will be some error

!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

the final error is: fastai.callback.all was missing on importing fastbook, and this final forum post was quite helpful in solving that (close and halt the notebook and reopen and comment out “#!pip install -Uqq fastbook”)

$ diff settings.ini settings.ini.bak 
16,17c16,18
< requirements = fastcore>=1.3.8,<1.4 torchvision==0.6,<0.9 matplotlib pandas requests pyyaml fastprogress>=0.2.4 pillow>6.0.0 scikit-learn scipy spacy<3
< conda_requirements = pytorch>=1.5
---
> requirements = fastcore>=1.3.8,<1.4 torchvision>=0.8.2,<0.9 matplotlib pandas requests pyyaml fastprogress>=0.2.4 pillow>6.0.0 scikit-learn scipy spacy<3
> pip_requirements = torch>=1.7.0,<1.8
> conda_requirements = pytorch>=1.7.0,<1.8
$ diff environment.yml environment.yml.bak 
8c8
< - torchvision==0.6
---
> - torchvision>=0.8
18c18
< - pytorch>=1.5.0
---
> - pytorch>=1.7.0

“”"
original attempt
I tried one more thing before doing the advanced approach because you told me to check settings.ini : turns out that torchvision torchvision>=0.5 is listed (in the checkout which requires pytorch 1.3.)

However only torchvision 0.4.2 is available in the WML standard channel above, torchvision 0.6 is available in the early-access channel. So I made a new environment with IBM LMS pytorch 1.5 from the early access channel, now I’m able to install the checkout 11603b239c2bd03c0d28c4e883e59a35e94bb86f without a hard error.

Although there was no hard error at the fastai install stage there was further difficulty with matching the fastbook version to fastai, so now the top of this post is the much prefered solution.

If running an old checkout version of fastai: I was never able to find a version of fastbook that did not use “Tuple” (out of the old fastcore version) , but also did not use fastai.vision.all instead of fastai2.vision.all (fastai2 is actually the old naming scheme as fastai is now effectively version 2)

1 Like