I can’t do it  readonly system
My kernel is https://www.kaggle.com/sq5rix/hcdfastai/
I can run in my laptop, but I fail at kaggle kernel
Struggled with this for a while, some tips for Kaggle kernels where you don’t have internet access to download pretrained weights:
 Under draftenvironment in your kernel, click the “adddata” button, search for the relevant pytorch model. For example I wanted to use the Resnet50, so I added the Resnet50 Pytorch (not the Keras) model to my kernel (click “Add”).
 This will give you a new Resnet50 directory and the *.pth weight file inside of that directory. Now you need to copy the *.pth file into the torch model directory using the same filename it’s looking for. One way to do this is to try to run the model without copying anything. I would get an error like:
failed download to /tmp/.torch/models/resnet5019c8e357.pth
 So we need to copy the resnet50.pth file from the ResNet50 directory that was just automatically added we to the directory where it’s just errored out. My resnet50 file was in:
../input/resnet50/resnet50.pth
 Therefore, run a copy command that takes the file you have and put in the place where it’s looking for it, using the same modelsha_hash naming convention.
!cp ../input/resnet50/resnet50.pth /tmp/.torch/models/resnet5019c8e357.pth
 Make sure you remember that your …/input directory is read only and the models are going to be changed during the learning process so you need to go up one level when creating your learner:
./
: put path here
./input
: read only, don’t put path here
Thanks a lot! I will just mention that in order to change learner’s directory one should add model_dir kwarg to create_cnn function so it looks like that:
learn = create_cnn(data, models.resnet34, metrics=error_rate, model_dir='/tmp/models')
There’s a kaggle dataset for wt103
https://www.kaggle.com/mnpinto/fastaiwt1031
But when I use used hyperparameter fnames="…/input/fastaiwt1031/wt1031.tgz" for language model_learner, it reported “FileNotFoundError: [Errno 2] No such file or directory: ‘…/models/…pkl’” How can I deal with that?
One thing to note is that `/tmp/.torch/models/’ has to exist, so here is what I do:
!mkdir p /tmp/.torch/models
!cp /kaggle/input/resnet50/resnet50.pth /tmp/.torch/models/resnet5019c8e357.pth
learn = cnn_learner(data, models.resnet50,path='/kaggle/working/', .... )
Thanks! This was very helpful. I wonder if it is worth updating the learner code to not force download a model if we are giving it a path to a local model.
!cp /kaggle/input/resnet34/resnet34.pth /tmp/.cache/torch/checkpoints/resnet34333f7ec4.pth
learn = cnn_learner(data, models.resnet34, metrics=error_rate,
model_dir = Path('..input/working'),
path = Path("."))
the files seems to be copying running fine until I try to commit in kaggle script kernel. (which attempts to run the whole file at a go) Then I get an error because cnn_learner is attempting to download resnet.
Any help is appreciated. Thank you
This does not work for me, I am working in Kaggle and it is still trying to download the model. Can you please guide me?
@amyku Did you find a solution to your problem? I am having the same problem. I am able to run the kernel successfully, but once I try and commit it, it attempts to download the resnet101 model, even though I have saved it in /kaggle/working/models/
folder.
learn = cnn_learner(data, models.resnet101, metrics=[error_rate, accuracy], model_dir="/kaggle/working/models")
Yes, it eventually worked for me.
First try restarting the kernel. It’s probably better not to save the model in working
since it’s temporary and I think the content is deleted after your kernel session ends. Take a look at what I did in this link
https://www.kaggle.com/aminyakubu/aptos2019blindnessdetectionfastai
You can simply go to Line 3
@amyku Thanks for the fast response. I followed your method in the kernel, and I managed to commit it successfully.
Thanks again!
Hi,
may be my way will be useful for someone…
 download weights to input (…/input/resnet34…)
from torchvision.models import resnet34
def my_resnet(pretrained=False, progress=True, **kwargs):
m = resnet34(pretrained=False, progress=True, **kwargs)
m.load_state_dict(torch.load("…/input/resnet34/resnet34.pth"))
return m
learn = cnn_learner(data, my_resnet,metrics=accuracy)
(based on what I’ve figured out from pytorch and fastai code)
Thanks!
!mkdir p '/tmp/.cache/torch/checkpoints'
!cp ../input/fastaipretrainedmodels/densenet121a639ec97.pth /tmp/.cache/torch/checkpoints/densenet121a639ec97.pth
learn_cd = cnn_learner(data_cd, models.densenet121, metrics=[error_rate, accuracy],model_dir = Path('../kaggle/working'),path=Path('.'),).to_fp16()
I am still getting GAIError while trying to commit. Any advice
!mkdir p '/tmp/.cache/torch/checkpoints'
!cp ../input/fastaipretrainedmodels/densenet121a639ec97.pth /tmp/.cache/torch/checkpoints/densenet121a639ec97.pth
learn_cd = cnn_learner(data_cd, models.densenet121, metrics=[error_rate, accuracy],model_dir = Path('../kaggle/working'),path=Path('.'),).to_fp16()
I am still getting GAIError while trying to commit. Any advice ?
I want to use the AWD LSTM pretrained model but the competition doesn’t allow internet access. I have added the model as external data but I don’t know where to move it or how to load it from the directory
Create your learner (with pretrained=False
if it has that option), then use learn.load(path/to/your/model)
to load the pretrained weights
Alright, it worked but now I am getting error, apparently the size of the weights have changed.
RuntimeError: Error(s) in loading state_dict for SequentialRNN:
size mismatch for 0.encoder.weight: copying a param with shape torch.Size([60002, 400]) from checkpoint, the shape in current model is torch.Size([7224, 400]).
size mismatch for 0.encoder_dp.emb.weight: copying a param with shape torch.Size([60002, 400]) from checkpoint, the shape in current model is torch.Size([7224, 400]).
size mismatch for 0.rnns.0.weight_hh_l0_raw: copying a param with shape torch.Size([4600, 1150]) from checkpoint, the shape in current model is torch.Size([4608, 1152]).
size mismatch for 0.rnns.0.module.weight_ih_l0: copying a param with shape torch.Size([4600, 400]) from checkpoint, the shape in current model is torch.Size([4608, 400]).
size mismatch for 0.rnns.0.module.weight_hh_l0: copying a param with shape torch.Size([4600, 1150]) from checkpoint, the shape in current model is torch.Size([4608, 1152]).
size mismatch for 0.rnns.0.module.bias_ih_l0: copying a param with shape torch.Size([4600]) from checkpoint, the shape in current model is torch.Size([4608]).
size mismatch for 0.rnns.0.module.bias_hh_l0: copying a param with shape torch.Size([4600]) from checkpoint, the shape in current model is torch.Size([4608]).
size mismatch for 0.rnns.1.weight_hh_l0_raw: copying a param with shape torch.Size([4600, 1150]) from checkpoint, the shape in current model is torch.Size([4608, 1152]).
size mismatch for 0.rnns.1.module.weight_ih_l0: copying a param with shape torch.Size([4600, 1150]) from checkpoint, the shape in current model is torch.Size([4608, 1152]).
size mismatch for 0.rnns.1.module.weight_hh_l0: copying a param with shape torch.Size([4600, 1150]) from checkpoint, the shape in current model is torch.Size([4608, 1152]).
size mismatch for 0.rnns.1.module.bias_ih_l0: copying a param with shape torch.Size([4600]) from checkpoint, the shape in current model is torch.Size([4608]).
size mismatch for 0.rnns.1.module.bias_hh_l0: copying a param with shape torch.Size([4600]) from checkpoint, the shape in current model is torch.Size([4608]).
size mismatch for 0.rnns.2.module.weight_ih_l0: copying a param with shape torch.Size([1600, 1150]) from checkpoint, the shape in current model is torch.Size([1600, 1152]).
size mismatch for 1.decoder.weight: copying a param with shape torch.Size([60002, 400]) from checkpoint, the shape in current model is torch.Size([7224, 400]).
size mismatch for 1.decoder.bias: copying a param with shape torch.Size([60002]) from checkpoint, the shape in current model is torch.Size([7224]).
found this Language_model_learner not working as before?
which removes the errors with loading of weights with shape 1552 now I am getting
RuntimeError: Error(s) in loading state_dict for SequentialRNN: size mismatch for 0.encoder.weight: copying a param with shape torch.Size([60002, 400]) from checkpoint, the shape in current model is torch.Size([7248, 400]). size mismatch for 0.encoder_dp.emb.weight: copying a param with shape torch.Size([60002, 400]) from checkpoint, the shape in current model is torch.Size([7248, 400]). size mismatch for 1.decoder.weight: copying a param with shape torch.Size([60002, 400]) from checkpoint, the shape in current model is torch.Size([7248, 400]). size mismatch for 1.decoder.bias: copying a param with shape torch.Size([60002]) from checkpoint, the shape in current model is torch.Size([7248]).
I searched for a long time for a solution to this issue. None of the options here helped me much (most likely because I did not go into much detail about the actual logic of the solutions and simply tried to tweak the code ). What did help was looking into the logic employed by this Kaggle user:
The idea is that you have 2 notebooks, 1 for training the model (which can use internet) and the 2nd one for inference that is using the first one as input. Please see the examples below:
Notebook 1: https://www.kaggle.com/bjoernholzhauer/fastaihowtosetupefficientnetb40945lb
The notebook downloads and trains the model and outputs only the model.
Notebook 2:https://www.kaggle.com/bjoernholzhauer/inferencefortrainedfastaiefficientnetb4
This notebook uses the model trained from notebook 1 as input (without any internet access) only for inference.
The following worked for me. I wanted to use a pretrained resnet18 model in a Kaggle competition.

I added the pretrainedpytorchmodels dataset into my notebook Pretrained PyTorch models  Kaggle (has pretrained resnet18)

!mkdir p /root/.cache/torch/hub/checkpoints/ !cp /kaggle/input/pretrainedpytorchmodels/resnet185c106cde.pth /root/.cache/torch/hub/checkpoints/resnet185c106cde.pth
The shell commands copy over the pretrained resnet18 model to the location that Torch expects on Kaggle. The location can be determined from the message torch gives when it is downloading models over the internet.
After this I was able to submit my model without being connected to the internet.
for me worked the following on kaggle:

Uploaded the model on the kaggle notebook.

Created a copy of the loaded model into the kaggle/working dir, directory name /models:
!cp …/input/tpsfeb22xresnet18modelfastai/xres18.pth ./models 
created a learner:
learn = Learner(dls, xresnet18(n_out=10), metrics=accuracy) 
loaded the model:
learn.load(‘xres18’)