Lesson 1 official topic

Np man. It can get quite confusing indeed.
If your output is of shape 118, 2 that means you have 118 images and 2 classes.
The 2 outputs per image are the probabilities of the 2 classes. Softmax (or sigmoid) has already been applied to the output of the last NN layer.
You can check if this is the case. They should sum up to 1, image-wise. Do they?
E.g. by definition preds should sum up to 118, given each row sums up to 1.

Having said that, what is the actual prediction of the model?
Well, that’s the class with the highest probability per-image.
So, if for image1 your preds are (0.91, 0.09) then class 0 is the prediction.
To do that programmatically, you need to apply argmax to the preds row-wise, e.g. preds.argmax(dim=0).
The output will be a tensor of integers (either 0 or 1, the predicted classes) of shape (118, 1).

Good, now you have predictions.
How do you measure accuracy?
To do that, you need the ground truths.
Ideally you’ll have somewhere the labels for the test set.
Either an encoded tensor of integers (0 or 1) of shape (118, 1) or a list of len 118 with the strings with the actual labels, e.g. [cat, dog, dog, dog, cat, …].
If it is the latter you need to encode them into integers.
But how do you know which class is 0 or 1?
learn.dls.vocab to the rescue.
You’ll get something like ['cat', 'dog] which is telling you that cat=0 and dog=1.
Let’s call the tensor of encoded ground truths gts.
Now the last step is:
accuracy = (gts == preds.argmax(dim=0)).average()

Something like that.
I didn’t check the syntax and I am sure it is wrong but you hopefully get the point.

5 Likes

Hi all,

I have a question please.

So, the parameter valid_pct=0.2 means that fastai will hold out 20% of the input data and not use it for training. So in essence, this means that fastai assumes that the dataset we pass it, will be the full dataset every time.

What if we already have separate datasets for training versus testing? So this means we would want to use the full set passed in (training set) and fastai should not default the valid_pct to 0.2. How do we get by this?

Thanks in advance. Much appreciated!

1 Like

Try this same approach on the validation set.
You should get the accuracy number you got on the last epoch of training.
E.g. run learn.validate, then replicate the above procedure to the validation set and check if you get the same number.

Each time I did that ,and numbers didn’t match, that meant I had accidentally shuffled the images, and ground truths and predictions didn’t align anymore.

the valid_pct=0.2 is the default behaviour.
You can change that.
Check this for instance

    dblock = DataBlock(blocks = (ImageBlock, CategoryBlock),
                   get_x = get_x, 
                   get_y = get_y,
                   splitter=IndexSplitter(df_fold.loc[df_fold.which=='valid'].index),
                   item_tfms=Resize(700),
                   batch_tfms=aug_transforms(size=size, max_rotate=30., min_scale=0.75, flip_vert=True, do_flip=True))

In here I was working with a df_fold pandas dataframe where the which column contained the training valid split. I had created the split myself as random was not good in that case.
As you can see it gets as flexible as you want.

Check the docs: https://docs.fast.ai/data.transforms.html#Split

Here my (old) notebook: KagglePlaygrounds/Plant_Pathology.ipynb at master · FraPochetti/KagglePlaygrounds · GitHub

I have similar problem when I was trying to apply lesson 1 on to this AIcrowd | Age Prediction | Challenges challenge. So I went this way:

# load csv files into dataFrame
train_df = pd.read_csv('data/train.csv')
val_df = pd.read_csv('data/val.csv')

# added field
train_df['is_valid'] = False
val_df['is_valid'] = True

# merged to one dataFrame
all_df = pd.concat((train_df, val_df))

# defined load x function
path = Path('data')
def get_x(x):
  if x['is_valid']:
    return path/"val"/f'{x[0]}.jpg'
  else:
    return path/"train"/f'{x[0]}.jpg'

# and finally created the datablock
dls = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_x=get_x,
    get_y=ColReader(1),
    splitter=ColSplitter(),
    item_tfms=[Resize(192, method='squish')],
    batch_tfms=aug_transforms()
).dataloaders(all_df) 

BTW you can use different splitter here ColSplitter or RandomSplitter

1 Like

2 posts were merged into an existing topic: Help: Jupyter Notebook :white_check_mark:

Thank you. I think the only thing that I had to meddle with a bit was the dim=0 argument passed into argmax since that should be dim=1 otherwise we just get a single value. I’m writing up how I worked this in a blogpost, though I think and hope I’ll get some more practice manipulating tensors in upcoming lessons since it felt a bit counterintuitive to get my head round how that worked. (I get ~92% accuracy on my held out test set, which is pretty nice to see. I guess my cat is unique after all :slight_smile: )

2 Likes

since that should be dim=1

It’s quite incredible how I can’t get these basic arguments right after so long :sweat:
I feel the more experience I get the more I Google basic stuff lol.

5 Likes

It’s the same for us all. More to the point, a few years ago I read a post on Terence Tao’s blog in which he stated more or less the same thing. So, you are in good company. :slight_smile:

3 Likes

Hi everyone,

Now for my own dataset, I used images of zucchini and cucumber to train a classification model .It correctly classified the class and predicted .
I took another dataset ,alligator vs crocodile to train a classification model. I downloaded dataset of alligator and crocodile .when training the model .when I print dataloaders (dls) getting different images

can someone help why the images in dataloaders are different. This is the kaggle notebook that I created for this exercise

Your notebook isn’t public

In the sidebar of kaggle.com it shows my GPU usage. Its not clear to me whether I’ve used:
(a) 2 min1. utes 12 seconds, or
(b) 2 hours 12 minutes.
I failed to find the answer in 10 minutes of googling, so I’m asking.
image

If answer is (b), then I’d feed back to kaggle team that “30 hrs” be instead written as “30:00 hrs”. Then I would not have needed to ask.

2 Likes

Again I rechecked my code I missed curly braces, so i was not getting relevant images.

    download_images(dest,urls=search_images(f'{o} photo'))

Now the code is working fine, However I am not sure what is happening, because I am getting accuracy between 57% and 64% (in different runs).
Also I have made the kernel public now. :sweat_smile:

ScreenShot 2022-05-02 at 14.10.41

Maybe things like this (from your notebook) are why it’s finding it hard to train. Also I wonder whether crocodile / alligator are something where people upload or publish pictures and label it as ‘crocodile’, while in reality it’s actually an alligator (and vice-versa). i.e. the problem’s really in the data. It’s a nice example of why problems in your ground truth data can cause upstream issues.

6 Likes

I was not aware of this problem. Thanks for the information. So any suggestion on how to fix this.

As far as I can see, you would either choose an example where it is less likely that randomly downloaded images will be wrong (i.e. like cat vs dog etc), or you find a dataset where you are sure that the labels are correct. Perhaps there was some scientist online who studies crocodiles and you can be sure that those images are really crocodiles. I wouldn’t know where to go to find those images, however…

1 Like

Image search is a very interesting but often deeply flawed source of images. I tried searching for things like “woman in a blue t-shirt” and 60% of the results lack either a t-shirt or a woman or the t-shirts have invalid colors. Going deeper into these results clearly show that despite the fact that Google has the best computer vision models, the image search results are still mostly based on associating the images with the surrounding text on a webpage.

The interesting thing is that all the biggest models nowadays (CLIP, DALL•E, etc.) are trained on images and text scrapped from the web but seem to work despite having 50% (my guess) noise in the ground truth.

TLDR; Always take a long and careful look into the training data you are using.

5 Likes

Discovered that if I turn GPU OFF then GPU ON, it tells me I have 27 hours remaining, so answer is (b) - used time is HH:MM.

1 Like

A post was merged into an existing topic: Help: Using Colab or Kaggle :white_check_mark: