How to train on the full dataset using ImageClassifierData.from_csv

sermakarevich · November 13, 2017, 7:01pm

Yep, thats pretty amazing. Can you please share what steps worked for you?

wgpubs · November 13, 2017, 7:04pm

Going to document everything in a medium post tomorrow and will reply back here when its up.

Btw @sermakarevich , really enjoyed your posts on using K-Fold validation with this competition! It’s nice to be placing up there with you and the other fastai students doing the comp

sermakarevich · November 13, 2017, 7:12pm

I dont get your score with a single model, so maybe cv works not that well Thanks for intend to share. Can you please add my @ so I wont miss your post?

jamesrequa · November 13, 2017, 7:15pm

Do you mean that when you switched to other architectures you just directly trained with the whole dataset (following your initial process learned from the first architecture)?

wgpubs · November 13, 2017, 7:16pm

How are you ensembling?

Are you averaging model weights to create a super model? Or are you averaging predictions against an identicially trained model that uses different training and validation data sets (e.g., like k-fold cv)?

wgpubs · November 13, 2017, 7:17pm

Exactly.

KevinB · November 13, 2017, 7:18pm

I am averaging predictions currently.

ramesh · November 13, 2017, 7:28pm

Which Kaggle competition is this?

wgpubs · November 13, 2017, 7:30pm

Yah, I’m interested to see what @jeremy has to say on the value of K-Fold CV in neural networks in general and CNN architectures specifically.

It looks like @sermakarevich used it towards great results. I’m not sure exactly what his process was (e.g., did he use the same architecture or multiple architectures, did he use the same process to train each model or did it vary, etc…, etc…).

wgpubs · November 13, 2017, 7:31pm

A_TF57 · November 14, 2017, 12:36am

I started with resnet34 (using validation set) and got to rank 60 (the model was decent). I then moved to resnext101_64 with the same data split, and the ranking improved to 22.

I am now trying to use this resnext101_64 to train using the full dataset and val_idxs = [0], but I get an accuracy of 1. and training loss of about 0.62. I am not quite sure I am comfortable with what I see.

This means I’m overfitting right (given there is no validation set)? Am I right in assessing that this does not look as good as it should?

jeremy · November 14, 2017, 12:50am

If you’re changing architecture, you need a proper validation set. Don’t use a validation set with just one image in!

A_TF57 · November 14, 2017, 12:59am

My bad. I think I got it now.

So just to clarify, in order to utilize the complete training data in this case, if I train my model first using some data split and then recreate the data object and set val_idxs = [0] and train again, that should give me better results?

jeremy · November 14, 2017, 1:01am

Yes - you need to exactly replicate a full process that worked with the validation set in place. Otherwise you don’t know if you’re over or under fitting!

These bigger models have far more parameters, so they overfit easily…

A_TF57 · November 14, 2017, 1:01am

Thanks! Makes much more sense now.

mprabhu · November 17, 2017, 8:54pm

hi @wgpubs,
to confirm my understanding:
return data if sz>300 else data.resize(340,‘tmp’)

if the image size is greater than 300 then we use it as it is in ‘train’ ‘valid’ ‘test’ folder.
and if the image size is less than 300, we resize it into 340 size and place it in /tmp folder and use it.

Is my understanding correct?

wgpubs · November 17, 2017, 9:04pm

Correct.

And any request we make for a size less than 300 will simply use the saved images in the /tmp folder. So if you first train against size 224 and then 299, the transforms will simply be resizing the same set of saved images.

mprabhu · November 17, 2017, 9:13pm

ok. Thank you.

stathis · November 20, 2017, 9:39pm

I tried to use val_idxs=[0] but I get an assertion error when trying to load the pretrained convnet.

jamesrequa · November 20, 2017, 9:45pm

@stathis I think its because you have Precompute=True. It’s giving the error because you previously had activations generated on the validation set but now the validation set isn’t there. I believe if you switch to Precompute=False then it should work. In other words, training on all of the data doesn’t really work with Precompute=True.