How to train on the full dataset using ImageClassifierData.from_csv

My believe was, Adam adaps lr as a function of how validation loss changes, but obviously I have to find out better how it adapts lr. :thinking:
(EDIT: I had got Adam completely wrong, understood thanks to part 1 explanation, link in below post)

Turns out that best resource to understand Adagrad/Rmsprop/Adam was… FAST.AI !!! :grinning:
(and yes, I completely had understood adam wrong)

So, thanks to @EricPB 's excelent video timelines of version 1 of this course here Part 1: complete collection of video timelines I could find adagrad here explained with an excel spreadsheet, just what I needed!

Thank you @Jeremy, this solved a big misconception I had ,one less left! :grinning:


This works, but I’m curious.

I thought one of the golden rules was to always have a validation data set (which we are essentially eliminating). So does this work because we have already determined our model and process is good enough with a validation set first? Or does it work simply because the process of training on smaller image sizes and then larger sizes just generalizes well, in which case, it would make sense that we could just start with training on the full data set and forego using a validation set completely?

Another question that I can’t believe I never realized …

In your example, you specify sizes of 224 and then 299 for training … BUT, the get_data() method above resizes both to 340. So what is being gained?

1 Like

Exactly this!

1 Like

The transforms downsize the images to 224 or 299. Reading the jpgs and resizing is slow for big images, so resizing them all to 340 first saves time.


Thank you for the pre-resizing clarification. It was a bit confusing for me too. Thought that was a bug in the code.

What’s interesting to me is that the practice seems to apply to related architectures as well.

Case in point. I started with resnet34 and a validation data set, followed the basic training process, set the data to 299, and went through a few final iterations. Things were looking good so I submitted to kaggle and placed around 60th.

Ran through the same steps without a validation set and things improved. I moved up in the competition somewhere in the 40’s.

So I thought, “Will more complex resent models improve without a validation dataset if I follow the steps above?”

So I ran through the same process with resnext50, and my ranking improved. Ran through it one more time using resnext101, and it put me at 14th place.

def get_data(sz, bs):
    tfms = tfms_from_model(arch, sz, aug_tfms=transforms_side_on, max_zoom=1.1)
    data = ImageClassifierData.from_csv(path=PATH,
                                        suffix = '.jpg')
    return data if sz > 300 else return data.resize(340, 'tmp')

data_500 = get_data(size=500, bs)

data_224 = get_data(size=224, bs)
data_299 = get_data(size=299, bs)

Actually, the data.resize code() is still a bit not clear to me. The logic says:

  • That if the image size is > 300, then resize accoring to tfms and nothing else, which means I get 500x500 images/generator back? That means data_500 contains 500x500 images
  • However, if the image size is <=300, then tfms first resizes them to 224 or 299 and THEN again resize them 340x340 and store the resized pics in ‘340/tmp’ folder? So essentially data_224 and data_299 both have the same content since they are both getting resized to 340 in the end?

Clearly I am not understanding this correctly.


I haven’t tested this, but since you aren’t doing a resize, you’ll just be using the images in training and test folders

If < 300, the framework is going to create a bunch of 340x340 images under /tmp/340, and then USE those images to create the 224 and 299 images for training. The idea is to save computation time by having a big enough set of images to handle most all desired sizes you want to train on.


So you are using resnext101 by itself and getting to 14th or are you combing all of these into one ensemble?

No ensembling. It’s pretty amazing.

Yikes I have been ensembling like crazy and I’m in 22nd place currently. I might need to go back to the drawing board.

Yep, thats pretty amazing. Can you please share what steps worked for you?

Going to document everything in a medium post tomorrow and will reply back here when its up.

Btw @sermakarevich , really enjoyed your posts on using K-Fold validation with this competition! It’s nice to be placing up there with you and the other fastai students doing the comp :slight_smile:


I dont get your score with a single model, so maybe cv works not that well :wink: Thanks for intend to share. Can you please add my @ so I wont miss your post?

1 Like

Do you mean that when you switched to other architectures you just directly trained with the whole dataset (following your initial process learned from the first architecture)?

How are you ensembling?

Are you averaging model weights to create a super model? Or are you averaging predictions against an identicially trained model that uses different training and validation data sets (e.g., like k-fold cv)?


1 Like

I am averaging predictions currently.