How can I load a pretrained model on Kaggle using fastai?

soco_loco · February 18, 2019, 5:11pm

Struggled with this for a while, some tips for Kaggle kernels where you don’t have internet access to download pre-trained weights:

Under draft-environment in your kernel, click the “add-data” button, search for the relevant pytorch model. For example I wanted to use the Resnet-50, so I added the Resnet-50 Pytorch (not the Keras) model to my kernel (click “Add”).
This will give you a new Resnet-50 directory and the *.pth weight file inside of that directory. Now you need to copy the *.pth file into the torch model directory using the same filename it’s looking for. One way to do this is to try to run the model without copying anything. I would get an error like:

failed download to /tmp/.torch/models/resnet50-19c8e357.pth

So we need to copy the resnet50.pth file from the ResNet50 directory that was just automatically added we to the directory where it’s just errored out. My resnet50 file was in:

../input/resnet50/resnet50.pth

Therefore, run a copy command that takes the file you have and put in the place where it’s looking for it, using the same model-sha_hash naming convention.

!cp ../input/resnet50/resnet50.pth /tmp/.torch/models/resnet50-19c8e357.pth

Make sure you remember that your …/input directory is read only and the models are going to be changed during the learning process so you need to go up one level when creating your learner:
./ : put path here
./input : read only, don’t put path here

vasinkd · February 26, 2019, 8:15pm

Thanks a lot! I will just mention that in order to change learner’s directory one should add model_dir kwarg to create_cnn function so it looks like that:
learn = create_cnn(data, models.resnet34, metrics=error_rate, model_dir='/tmp/models')

LIBER · March 9, 2019, 1:25pm

There’s a kaggle dataset for wt103
https://www.kaggle.com/mnpinto/fastai-wt103-1
But when I use used hyper-parameter fnames="…/input/fastai-wt103-1/wt103-1.tgz" for language model_learner, it reported “FileNotFoundError: [Errno 2] No such file or directory: ‘…/models/…pkl’” How can I deal with that?

rbarman · April 19, 2019, 1:50am

One thing to note is that `/tmp/.torch/models/’ has to exist, so here is what I do:

!mkdir -p /tmp/.torch/models
!cp /kaggle/input/resnet50/resnet50.pth /tmp/.torch/models/resnet50-19c8e357.pth
learn = cnn_learner(data, models.resnet50,path='/kaggle/working/', .... )

cmac · June 12, 2019, 4:09pm

Thanks! This was very helpful. I wonder if it is worth updating the learner code to not force download a model if we are giving it a path to a local model.

amyku · July 9, 2019, 8:13pm

!cp /kaggle/input/resnet34/resnet34.pth /tmp/.cache/torch/checkpoints/resnet34333f7ec4.pth

learn = cnn_learner(data, models.resnet34, metrics=error_rate, 
model_dir = Path('..input/working'),
                   path = Path("."))

the files seems to be copying running fine until I try to commit in kaggle script kernel. (which attempts to run the whole file at a go) Then I get an error because cnn_learner is attempting to download resnet.

Any help is appreciated. Thank you

jasmeet · July 28, 2019, 3:46pm

This does not work for me, I am working in Kaggle and it is still trying to download the model. Can you please guide me?

portlyflounder · July 31, 2019, 9:46am

@amyku Did you find a solution to your problem? I am having the same problem. I am able to run the kernel successfully, but once I try and commit it, it attempts to download the resnet101 model, even though I have saved it in /kaggle/working/models/ folder.

learn = cnn_learner(data, models.resnet101, metrics=[error_rate, accuracy], model_dir="/kaggle/working/models")

amyku · July 31, 2019, 12:56pm

Yes, it eventually worked for me.

First try restarting the kernel. It’s probably better not to save the model in working since it’s temporary and I think the content is deleted after your kernel session ends. Take a look at what I did in this link

https://www.kaggle.com/aminyakubu/aptos-2019-blindness-detection-fast-ai
You can simply go to Line 3

portlyflounder · July 31, 2019, 1:41pm

@amyku Thanks for the fast response. I followed your method in the kernel, and I managed to commit it successfully.

Thanks again!

egm · August 23, 2019, 10:40am

Hi,
may be my way will be useful for someone…

download weights to input (…/input/resnet34…)

from torchvision.models import resnet34

def my_resnet(pretrained=False, progress=True, **kwargs):
m = resnet34(pretrained=False, progress=True, **kwargs)
m.load_state_dict(torch.load("…/input/resnet34/resnet34.pth"))
return m

learn = cnn_learner(data, my_resnet,metrics=accuracy)

(based on what I’ve figured out from pytorch and fastai code)

Thanks!

cap_rogers · February 12, 2020, 7:44am

!mkdir -p '/tmp/.cache/torch/checkpoints'
!cp ../input/fastai-pretrained-models/densenet121-a639ec97.pth /tmp/.cache/torch/checkpoints/densenet121-a639ec97.pth

learn_cd = cnn_learner(data_cd, models.densenet121, metrics=[error_rate, accuracy],model_dir = Path('../kaggle/working'),path=Path('.'),).to_fp16()

I am still getting GAIError while trying to commit. Any advice

cap_rogers · February 12, 2020, 8:24am

!mkdir -p '/tmp/.cache/torch/checkpoints'
!cp ../input/fastai-pretrained-models/densenet121-a639ec97.pth /tmp/.cache/torch/checkpoints/densenet121-a639ec97.pth

learn_cd = cnn_learner(data_cd, models.densenet121, metrics=[error_rate, accuracy],model_dir = Path('../kaggle/working'),path=Path('.'),).to_fp16()

I am still getting GAIError while trying to commit. Any advice ?

obiwan · April 18, 2020, 6:40am

I want to use the AWD LSTM pretrained model but the competition doesn’t allow internet access. I have added the model as external data but I don’t know where to move it or how to load it from the directory

morgan · April 18, 2020, 9:19am

Create your learner (with pretrained=False if it has that option), then use learn.load(path/to/your/model) to load the pretrained weights

obiwan · April 18, 2020, 4:10pm

Alright, it worked but now I am getting error, apparently the size of the weights have changed.

RuntimeError: Error(s) in loading state_dict for SequentialRNN:
size mismatch for 0.encoder.weight: copying a param with shape torch.Size([60002, 400]) from checkpoint, the shape in current model is torch.Size([7224, 400]).
size mismatch for 0.encoder_dp.emb.weight: copying a param with shape torch.Size([60002, 400]) from checkpoint, the shape in current model is torch.Size([7224, 400]).
size mismatch for 0.rnns.0.weight_hh_l0_raw: copying a param with shape torch.Size([4600, 1150]) from checkpoint, the shape in current model is torch.Size([4608, 1152]).
size mismatch for 0.rnns.0.module.weight_ih_l0: copying a param with shape torch.Size([4600, 400]) from checkpoint, the shape in current model is torch.Size([4608, 400]).
size mismatch for 0.rnns.0.module.weight_hh_l0: copying a param with shape torch.Size([4600, 1150]) from checkpoint, the shape in current model is torch.Size([4608, 1152]).
size mismatch for 0.rnns.0.module.bias_ih_l0: copying a param with shape torch.Size([4600]) from checkpoint, the shape in current model is torch.Size([4608]).
size mismatch for 0.rnns.0.module.bias_hh_l0: copying a param with shape torch.Size([4600]) from checkpoint, the shape in current model is torch.Size([4608]).
size mismatch for 0.rnns.1.weight_hh_l0_raw: copying a param with shape torch.Size([4600, 1150]) from checkpoint, the shape in current model is torch.Size([4608, 1152]).
size mismatch for 0.rnns.1.module.weight_ih_l0: copying a param with shape torch.Size([4600, 1150]) from checkpoint, the shape in current model is torch.Size([4608, 1152]).
size mismatch for 0.rnns.1.module.weight_hh_l0: copying a param with shape torch.Size([4600, 1150]) from checkpoint, the shape in current model is torch.Size([4608, 1152]).
size mismatch for 0.rnns.1.module.bias_ih_l0: copying a param with shape torch.Size([4600]) from checkpoint, the shape in current model is torch.Size([4608]).
size mismatch for 0.rnns.1.module.bias_hh_l0: copying a param with shape torch.Size([4600]) from checkpoint, the shape in current model is torch.Size([4608]).
size mismatch for 0.rnns.2.module.weight_ih_l0: copying a param with shape torch.Size([1600, 1150]) from checkpoint, the shape in current model is torch.Size([1600, 1152]).
size mismatch for 1.decoder.weight: copying a param with shape torch.Size([60002, 400]) from checkpoint, the shape in current model is torch.Size([7224, 400]).
size mismatch for 1.decoder.bias: copying a param with shape torch.Size([60002]) from checkpoint, the shape in current model is torch.Size([7224]).

found this Language_model_learner not working as before?
which removes the errors with loading of weights with shape 1552 now I am getting

RuntimeError: Error(s) in loading state_dict for SequentialRNN: size mismatch for 0.encoder.weight: copying a param with shape torch.Size([60002, 400]) from checkpoint, the shape in current model is torch.Size([7248, 400]). size mismatch for 0.encoder_dp.emb.weight: copying a param with shape torch.Size([60002, 400]) from checkpoint, the shape in current model is torch.Size([7248, 400]). size mismatch for 1.decoder.weight: copying a param with shape torch.Size([60002, 400]) from checkpoint, the shape in current model is torch.Size([7248, 400]). size mismatch for 1.decoder.bias: copying a param with shape torch.Size([60002]) from checkpoint, the shape in current model is torch.Size([7248]).

George2 · March 20, 2021, 5:57am

I searched for a long time for a solution to this issue. None of the options here helped me much (most likely because I did not go into much detail about the actual logic of the solutions and simply tried to tweak the code ). What did help was looking into the logic employed by this Kaggle user:
The idea is that you have 2 notebooks, 1 for training the model (which can use internet) and the 2nd one for inference that is using the first one as input. Please see the examples below:
Notebook 1: https://www.kaggle.com/bjoernholzhauer/fastai-how-to-set-up-efficientnet-b4-0-945-lb
The notebook downloads and trains the model and outputs only the model.
Notebook 2:https://www.kaggle.com/bjoernholzhauer/inference-for-trained-fastai-efficientnet-b4
This notebook uses the model trained from notebook 1 as input (without any internet access) only for inference.

sinhak · September 27, 2021, 3:09pm

The following worked for me. I wanted to use a pretrained resnet18 model in a Kaggle competition.

I added the pretrained-pytorch-models dataset into my notebook Pretrained PyTorch models | Kaggle (has pretrained resnet18)
!mkdir -p /root/.cache/torch/hub/checkpoints/ !cp /kaggle/input/pretrained-pytorch-models/resnet18-5c106cde.pth /root/.cache/torch/hub/checkpoints/resnet18-5c106cde.pth

The shell commands copy over the pretrained resnet18 model to the location that Torch expects on Kaggle. The location can be determined from the message torch gives when it is downloading models over the internet.

After this I was able to submit my model without being connected to the internet.

Kap · February 8, 2022, 6:55pm

for me worked the following on kaggle:

Uploaded the model on the kaggle notebook.
Created a copy of the loaded model into the kaggle/working dir, directory name /models:
!cp …/input/tps-feb-22-xresnet18-model-fastai/xres18.pth ./models
created a learner:
learn = Learner(dls, xresnet18(n_out=10), metrics=accuracy)
loaded the model:
learn.load(‘xres18’)