Platform: Kaggle Kernels

@ilovescience I can see the .pth files being created in the /kaggle/working/models directory while running through the notebook. I can commit the notebook and view it in Kernels --> My Work --> Output. I can click on the notebook and see the Notebook, Data, Log, and Comments links on the left-hand rail but under Data I only see the Source Data not the .pth files.

I also see Data Files under Output but just see a message that, “You haven’t created any kernels yet.” when I select Data Files. I’ve been poking at this for awhile now and need a break. Thank you for your assistance!!

I think the questions raised are mostly around how to save your model and how to upload your dataset.

  1. Saving gives error:
    Kernels load datasets from ../input/ this is read only storage. If we create a learner from here and try learn.save(), the function tries to save in the same folder as the dataset. Hence the error.

Fix: Set the path in learn.save() accordingly.

  1. I don’t see my kernel outputs:

Too see the outputs, you need to “commit” the kernel, which will run the nb from end to end. Thus creating your output files.

  1. Uploading custom datasets:

You will need to create custom dataset by uploading it to kaggle datasets and then linking it to your kernel. Follow the kaggle kernels tutorial to understand this.

Hope these solve the pain points.
Please let me know if I’ve missed anything.

I’ve created a FAQ section above with these points.

this is beautiful and super useful!

You’re welcome :slight_smile:

I have copied and translated your FAQ and updated and tweak your kernels a little for the Chinese version of fastai course notes. Your works are very helpful, and looking forward to reading more of your posts.

1 Like

Thanks!
Do you mind linking the translated post in the top?
You can edit this wiki and link to the translation for our Chinese speaking friends.

1 Like

@init_27 Thank you for providing that FAQ. While I think that is super useful, it still doesn’t address permanently saving and/or exporting files created during the session. i.e. weights learned (.pth files) and like in Lesson 2 labels.csv created when you clean-up your data. If this functionally exists, where is it? How is it used? If not, I think being able to permanently save and export are great capabilities that the community will benefit from having – thoughts?

1 Like

If you need the files created an “interactive session”, you can look up how to create a downloadable link for jupyter notebook. Then you can manually download it.

Otherwise, you’ll need to “commit” your notebook to have these produced as outputs.

With Lesson 2 you cannot “commit” the nb as it creates/downloades images > 500. Kaggle kernels have a known limit of 500 output files so they crash when you try to run them.

There are 2 fixes:

Download <500 files in total.
OR
Download the .pth file by creating a downloadable link

Hope this clarifies your question?

I’ll checkout the downloadable link method – thanks!

Also, one other work-around on the commit issue is to comment out all the code in the cells – yes, it’s annoying. So, can there be a commit without code/cell execution capability added?

The issue with that is, it will cry with errors since output files will be > 500.

I’ll tell you the shortcut: Use (Edit:) Ctrl + / to uncomment, that might make it a little less irritating but apologies, I’m not sure of any other workaround.

I tried using < 500 images. That gave pretty bad accuracy values.
If you want to give it a shot, I’ll be happy to add you as a contributor (with credit) and you can see if it works better with <500 images.

@init_27 Thinking about this more, you are right. In spite of having nearly 600 image files spread almost evenly across three directories, my final confusion matrix only has a total of 81 predictions. In hindsight, I am surprised the number of predictions is that low. Perhaps ImageDataBunch() had trouble reading most of those files? I think they were mostly .jpg files.

That said, I ran a few dozen epochs and finally landed on an error rate of .148148 – not great and at times I could see the model improving even though I also saw that the surface of the problem was lumpy. In the end, I am concerned the model was over-fitting and that it wouldn’t generalize well in the real world. Sounds like I need to add more data.

1 Like

BTW – It’s CTRL+/ to toggle comment / uncomment – but thanks for the tip!!

1 Like

It might be possible that a large number of corrupt images were downloaded in that case.

Thanks for the correction!

1 Like

Thanks! it is a great idea, I will do that!

1 Like

Hi all,

I had a problem running ‘Lesson 2 Download’ notebook when calling the function:

data2 = ImageDataBunch.single_from_classes(path, classes, tfms=get_transforms(), size=224).normalize(imagenet_stats).

TypeError: transform() got multiple values for argument 'tfms'.

For anyone with the same problem, I fixed this by changing the argument of the function from tfms=get_transforms() to ds_tfms=get_transforms().

Best wishes and thanks for creating / maintaining the Fast.ai Kaggle Kernels!

Ross

1 Like

@init_27 : Just recently I got a strange error using the image item list

data = (ImageItemList.from_folder(path)
.random_split_by_pct(0.2)
.label_from_folder()
.transform(tfms, size=128)
.databunch())

6 days ago this worked fine in my code, now I get the following error message:

 NameError                                 Traceback (most recent call last)
    <ipython-input-5-b92c0520489a> in <module>()
    ----> 1 data = (ImageItemList.from_folder(path)
          2         .random_split_by_pct(0.2)
          3         .label_from_folder()
          4         .transform(tfms, size=128)
          5         .databunch())

    NameError: name 'ImageItemList' is not defined. 

I also noticed that ‘loading-in-library part’ is now shows up in blue, not green anymore - is there a problem in loading the fast ai library?

%reload_ext autoreload
%autoreload 2
%matplotlib inline

Thankful for any help <3

I believe the latest fastai now uses ImageList

3 Likes

Wow- thank you very much. I thought I tried that but apparently I didn’t… Thanks a lot <3

Thank you for maintaining those lessons! Started using Kaggle Kernels and Google Colab to do my own work using fastai and realized why this effort of having working lessons notebooks is so useful. In both environments you get some version of fastai (is it always the latest version in the repo or are those environments pointing to some internal released version of the code?) so you never know when your code will break after incompatible fastai changes