Lesson 1: File structure in train folder in \dogscats\

(Yog Mehta) #1

I am trying to make my own model based on the one in Lesson 1, with all the training and validation folders & file structure.

What I’m confused about is: do I have to label the folders in the train directory?
For example, if I have images of bears and penguins, do I create bear and penguin folders and store the relevant images in them? Or do I that just for my valid folder?

(Sam Lloyd) #2

You do! see my folder structure for train in CIFAR-10.

Alternatively you can use a csv for the labels (see the planet notebook for this)

(Yog Mehta) #3



Hi, I have a follow-up question:
in ImageClassifierData.from_csv, I can fill in the name of training folder and test folder. what about validation folder? does val_idxs mean the validation images needs to be in the PATH folder?
thank you!

(Sam Lloyd) #5

Hi Anna,
So with from _csv the train and validation folder are the same (the folder variable as you can see below), and val_idxs controls how this is split.
def from_csv(cls, path, folder, csv_fname, bs=64, tfms=(None,None),
val_idxs=None, suffix=’’, test_name=None, continuous=False, skip_header=True, num_workers=8):
So you don’t even have to worry about the validation set, as this split is taken for you. Does that help?


Hey Sam,

thank you for your reply. I see.

When I download data from kaggle sometimes they provide a separate folder for validation set, if I understand you correctly, I would have to put validation dataset into the training data folder so from_csv can read it throw val_idxs?

thank you again

(Yog Mehta) #7

Hey Anna,
You actually don’t have to put validation data in the training data folder. You can have a separate folder for your validation set, like we did in dogs-vs-cats. Or you can use a number of other methods used in the course notebooks, if your data doesn’t have a separate validation set. I hope that answers your question.


Hey Yog,
I see. thank you!