Lesson 1 official topic

3 posts were merged into an existing topic: Help: Basics of fastai, PyTorch, numpy, etc :white_check_mark:

If I want to use a held-out test set to do the validation for my model, what would be the way to do that with fastai? I’m looking at the docs for ‘Inference functions’, but the wording is a bit opaque to me at the moment.

I basically want to first make all the predictions for the test set, then evaluate those against the labels for that test set, then output a single accuracy or error rate for those predictions. I’m assuming I’d want to do this in a batched way vs one-by-one, but just not sure what the right function for that is…

1 Like

I think you might want to do something like this.
E.g. assuming you have a list of images (files) somewhere, you can create a dataset on top of them and then run get_preds.
Once you have predictions, you can pretty much do anything you want with them.


Thanks. I have the predictions for a held-out test set now, a Tensor: torch.Size([118, 2]):

So for the 118 images that I held out, there are two values for each. As far as I’m aware, these correspond to the two classes I’m predicting.

So then to get the accuracy based on the test set, I should average all of the values at index 0 for the predictions. Would that approach be the right way to go about it? Or do I need to do some calculation that involves the predictions for the negative samples as well (i.e. at index 1)?

Come to think of it, at the moment I only have images of one of my classes in the test set, but perhaps that’s all I need? Getting confused trying to figure this out :slight_smile:


Np man. It can get quite confusing indeed.
If your output is of shape 118, 2 that means you have 118 images and 2 classes.
The 2 outputs per image are the probabilities of the 2 classes. Softmax (or sigmoid) has already been applied to the output of the last NN layer.
You can check if this is the case. They should sum up to 1, image-wise. Do they?
E.g. by definition preds should sum up to 118, given each row sums up to 1.

Having said that, what is the actual prediction of the model?
Well, that’s the class with the highest probability per-image.
So, if for image1 your preds are (0.91, 0.09) then class 0 is the prediction.
To do that programmatically, you need to apply argmax to the preds row-wise, e.g. preds.argmax(dim=0).
The output will be a tensor of integers (either 0 or 1, the predicted classes) of shape (118, 1).

Good, now you have predictions.
How do you measure accuracy?
To do that, you need the ground truths.
Ideally you’ll have somewhere the labels for the test set.
Either an encoded tensor of integers (0 or 1) of shape (118, 1) or a list of len 118 with the strings with the actual labels, e.g. [cat, dog, dog, dog, cat, …].
If it is the latter you need to encode them into integers.
But how do you know which class is 0 or 1?
learn.dls.vocab to the rescue.
You’ll get something like ['cat', 'dog] which is telling you that cat=0 and dog=1.
Let’s call the tensor of encoded ground truths gts.
Now the last step is:
accuracy = (gts == preds.argmax(dim=0)).average()

Something like that.
I didn’t check the syntax and I am sure it is wrong but you hopefully get the point.


Hi all,

I have a question please.

So, the parameter valid_pct=0.2 means that fastai will hold out 20% of the input data and not use it for training. So in essence, this means that fastai assumes that the dataset we pass it, will be the full dataset every time.

What if we already have separate datasets for training versus testing? So this means we would want to use the full set passed in (training set) and fastai should not default the valid_pct to 0.2. How do we get by this?

Thanks in advance. Much appreciated!

1 Like

Try this same approach on the validation set.
You should get the accuracy number you got on the last epoch of training.
E.g. run learn.validate, then replicate the above procedure to the validation set and check if you get the same number.

Each time I did that ,and numbers didn’t match, that meant I had accidentally shuffled the images, and ground truths and predictions didn’t align anymore.

the valid_pct=0.2 is the default behaviour.
You can change that.
Check this for instance

    dblock = DataBlock(blocks = (ImageBlock, CategoryBlock),
                   get_x = get_x, 
                   get_y = get_y,
                   batch_tfms=aug_transforms(size=size, max_rotate=30., min_scale=0.75, flip_vert=True, do_flip=True))

In here I was working with a df_fold pandas dataframe where the which column contained the training valid split. I had created the split myself as random was not good in that case.
As you can see it gets as flexible as you want.

Check the docs: https://docs.fast.ai/data.transforms.html#Split

Here my (old) notebook: KagglePlaygrounds/Plant_Pathology.ipynb at master · FraPochetti/KagglePlaygrounds · GitHub

I have similar problem when I was trying to apply lesson 1 on to this AIcrowd | Age Prediction | Challenges challenge. So I went this way:

# load csv files into dataFrame
train_df = pd.read_csv('data/train.csv')
val_df = pd.read_csv('data/val.csv')

# added field
train_df['is_valid'] = False
val_df['is_valid'] = True

# merged to one dataFrame
all_df = pd.concat((train_df, val_df))

# defined load x function
path = Path('data')
def get_x(x):
  if x['is_valid']:
    return path/"val"/f'{x[0]}.jpg'
    return path/"train"/f'{x[0]}.jpg'

# and finally created the datablock
dls = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    item_tfms=[Resize(192, method='squish')],

BTW you can use different splitter here ColSplitter or RandomSplitter

1 Like

2 posts were merged into an existing topic: Help: Jupyter Notebook :white_check_mark:

Thank you. I think the only thing that I had to meddle with a bit was the dim=0 argument passed into argmax since that should be dim=1 otherwise we just get a single value. I’m writing up how I worked this in a blogpost, though I think and hope I’ll get some more practice manipulating tensors in upcoming lessons since it felt a bit counterintuitive to get my head round how that worked. (I get ~92% accuracy on my held out test set, which is pretty nice to see. I guess my cat is unique after all :slight_smile: )


since that should be dim=1

It’s quite incredible how I can’t get these basic arguments right after so long :sweat:
I feel the more experience I get the more I Google basic stuff lol.


It’s the same for us all. More to the point, a few years ago I read a post on Terence Tao’s blog in which he stated more or less the same thing. So, you are in good company. :slight_smile:


Hi everyone,

Now for my own dataset, I used images of zucchini and cucumber to train a classification model .It correctly classified the class and predicted .
I took another dataset ,alligator vs crocodile to train a classification model. I downloaded dataset of alligator and crocodile .when training the model .when I print dataloaders (dls) getting different images

can someone help why the images in dataloaders are different. This is the kaggle notebook that I created for this exercise

Your notebook isn’t public

In the sidebar of kaggle.com it shows my GPU usage. Its not clear to me whether I’ve used:
(a) 2 min1. utes 12 seconds, or
(b) 2 hours 12 minutes.
I failed to find the answer in 10 minutes of googling, so I’m asking.

If answer is (b), then I’d feed back to kaggle team that “30 hrs” be instead written as “30:00 hrs”. Then I would not have needed to ask.


Again I rechecked my code I missed curly braces, so i was not getting relevant images.

    download_images(dest,urls=search_images(f'{o} photo'))

Now the code is working fine, However I am not sure what is happening, because I am getting accuracy between 57% and 64% (in different runs).
Also I have made the kernel public now. :sweat_smile:

ScreenShot 2022-05-02 at 14.10.41

Maybe things like this (from your notebook) are why it’s finding it hard to train. Also I wonder whether crocodile / alligator are something where people upload or publish pictures and label it as ‘crocodile’, while in reality it’s actually an alligator (and vice-versa). i.e. the problem’s really in the data. It’s a nice example of why problems in your ground truth data can cause upstream issues.


I was not aware of this problem. Thanks for the information. So any suggestion on how to fix this.

As far as I can see, you would either choose an example where it is less likely that randomly downloaded images will be wrong (i.e. like cat vs dog etc), or you find a dataset where you are sure that the labels are correct. Perhaps there was some scientist online who studies crocodiles and you can be sure that those images are really crocodiles. I wouldn’t know where to go to find those images, however…

1 Like