Lesson 1 official topic

ScreenShot 2022-05-02 at 14.10.41

Maybe things like this (from your notebook) are why it’s finding it hard to train. Also I wonder whether crocodile / alligator are something where people upload or publish pictures and label it as ‘crocodile’, while in reality it’s actually an alligator (and vice-versa). i.e. the problem’s really in the data. It’s a nice example of why problems in your ground truth data can cause upstream issues.

6 Likes

I was not aware of this problem. Thanks for the information. So any suggestion on how to fix this.

As far as I can see, you would either choose an example where it is less likely that randomly downloaded images will be wrong (i.e. like cat vs dog etc), or you find a dataset where you are sure that the labels are correct. Perhaps there was some scientist online who studies crocodiles and you can be sure that those images are really crocodiles. I wouldn’t know where to go to find those images, however…

1 Like

Image search is a very interesting but often deeply flawed source of images. I tried searching for things like “woman in a blue t-shirt” and 60% of the results lack either a t-shirt or a woman or the t-shirts have invalid colors. Going deeper into these results clearly show that despite the fact that Google has the best computer vision models, the image search results are still mostly based on associating the images with the surrounding text on a webpage.

The interesting thing is that all the biggest models nowadays (CLIP, DALL•E, etc.) are trained on images and text scrapped from the web but seem to work despite having 50% (my guess) noise in the ground truth.

TLDR; Always take a long and careful look into the training data you are using.

5 Likes

Discovered that if I turn GPU OFF then GPU ON, it tells me I have 27 hours remaining, so answer is (b) - used time is HH:MM.

1 Like

A post was merged into an existing topic: Help: Using Colab or Kaggle :white_check_mark:

Hi everyone,

I am trying to create a model to predict whether x ray images are pneumonia or not. I am using this dataset:

There are 3875 images of pneumonia, and 1341 images of normal cases in the training dataset. Right now, I am trying to get the model code to work, but getting some errors on the dls, and will then check the remaining parts.
Here is my Kaggle notebook:

Please help me.

After that, I will look into sorting out the class imbalance.

Thanks so much in advance - much appreciated.

From what I can make out, you’re trying to pass in each individual image into the get_path function, but what it really wants – see here – is a single string that has the base path for your data.

I see that you’re using the ImageDataLoaders.from_name_func, too – docs here – which takes arguments in a different way to how you’re passing them in in your code.

I think probably what you’ll want to use is ImageDataLoadrs.from_folder (docs) since the data is already in a format that’d be suitable for that way of ingesting the data.

I guess this probably gives you hints on how to move forward, but let me know if you want me to go into more specifics of how you’d make this work.

6 Likes

Thanks @strickvl - much appreciated!

Is anyone else also having issues with uploading .pkl files onto Hugging Face? It seems to take ages (and hangs) when uploading via git on the command line and also via the manual upload button on Hugging Face.

Also:

learn.export('model.pkl') does not seem to work on Kaggle - no file gets created in the output folder. I did some quick Google searches and realized its because we don’t have edit permissions.

The following seems to work fine; the file gets created in the ‘output’ folder of your notebook:

import pickle

with open('model.pkl', 'wb') as files:
    pickle.dump(learn, files)

The file gets created on the current working folder. You can check this by issuing !pwd on the cell before exporting

If you are using the model (resnet18) from the notebook, the model size alone would be around 45mb.

Alternatively, you can use ‘mobilevit_xxs’, from Timm, which gives similiar metrics, but the model size being only 5 mb.

It works for me… Make sure you click “Save Version” at the top right of the page, and then once it’s finished saving (which will take a few minutes since your notebook will be re-run from scratch), go to the “data” tab of your notebook, and you’ll find the model.pkl at the bottom.

I don’t recommend using pickle.dump.

Yup takes ages for me too, because they’re really big and my home upload speed is slow. Gets there eventually however!

2 Likes

Thanks a lot, @jeremy - much appreciated.

I tried using this - it is installed on my system and shows on my git as well (via command line):

But it says it fails to push the file to git on Hugging Face - this is what I get when trying to push the .pkl file (note that its been added and commited fine already):

$ git push
Uploading LFS objects: 100% (1/1), 47 MB | 0 B/s, done.
Enumerating objects: 11, done.
Counting objects: 100% (11/11), done.
Delta compression using up to 12 threads
Compressing objects: 100% (9/9), done.
Writing objects: 100% (9/9), 41.55 MiB | 189.00 KiB/s, done.
Total 9 (delta 3), reused 0 (delta 0), pack-reused 0
remote: Enforcing permissions...
remote: Allowed refs: all
remote: -------------------------------------------------------------------------
remote: Your push was rejected because it contains binary files.
remote: Please use https://git-lfs.github.com/ to store binary files.
remote: See also: https://hf.co/docs/hub/adding-a-model#uploading-your-files
remote: -------------------------------------------------------------------------
remote: Offending files:
remote:  - model.pkl (ref: refs/heads/main)
To https://huggingface.co/spaces/Zakia/chest_x_ray_pneumonia_predictor
 ! [remote rejected] main -> main (pre-receive hook declined)
error: failed to push some refs to 'https://huggingface.co/spaces/Zakia/chest_x_ray_pneumonia_predictor'

Can you try doing git lfs track "*.pkl" before commiting and pushing?

If you see my tutorial, what I did was the following:

git lfs install
git lfs track "*.pkl"
git add .gitattributes
git commit -m "update .gitattributes so git lfs will track .pkl files"
5 Likes

@ilovescience : thank you. Yes, I have done all of that, but still, the .pkl file is getting rejected when I’m trying to push it. Tried again now and the same error persists.

If you have committed the .pkl file normally with git and then tracked it with git-lfs then a reference to the .pkl file is still in the git repository.
When you push the git repository it will try to send that .pkl file from your earlier commit, and the server doesn’t like that.

You’ll need to rewrite your git history to only ever track .pkl files in LFS. It looks like git-lfs migrate import should help you do that.
If you don’t have too many commits it might just be easier to start with a fresh repository and use git-lfs to track the file from the first commit.

3 Likes

The issue was that the .gitattributes changes was getting included with the other files I had - in the commit and push. So, basically, the .gitattributes changes need to first be pushed after committed when the repository is cloned to the local machine. Then, verify that this change shows on Hugging Face in the .gitattributes file :

*.pkl filter=lfs diff=lfs merge=lfs -text

After the above, then add any files, like app.py, etc.

Tried this and it worked now.

8 Likes