00:00 - Questions 00:05 - About the concept/capability of early stoppings 04:00 - Different models, which one to use 05:25 - Gradient Boosting Machine with different model predictions 07:25 - AutoML tools 07:50 - Kaggle winners approaches, ensemble 09:00 - Test Time Augmentation (TTA): why does it improve the score? 11:00 - Training loss vs validation loss 12:30 - Averaging a few augmented versions 13:50 - Unbalanced dataset and augmentation 15:00 - On balancing datasets 15:40 - WeightedDL, Weighted DataLoader 17:55 - Weighted sampling on Diabetic Retinopathy competition 19:40 - Lets try something… 21:40 - Setting an environment variable when having multiple GPUs 21:55 - Multi target model 23:00 - Debugging 27:04 - Revise transforms to 128x128 and 5 epochs. Fine tune base case. 28:00 - Progressive resizing 29:16 - Fine tuning again but on larger 160x160 images 34:30 - Oops, small bug, restart (without creating a new learner) 37:30 - Re-run second fine-tuning 40:00 - How did you come up with the idea of progressive resizing? 41:00 - Changing things during training 42:30 - On the paper Fixing the train-test resolution discrepancy 44:15 - Fine tuning again but on larger 192x192 images 46:11 - A detour about paper reference management 48:27 - Final fine-tuning 256x192 49:30 - Looking at WeightedDL, WeightedDataLoader 57:08 - Back to the results of fine-tuning 256x192 58:20 - Question leading to look at callbacks 59:18 - About SaveModelCallback 01:00:56 - Contributing, Documentation, and looking at “Docments” 01:03:50 - Final questions: lr_find() 01:04:50 - Final questions: Training for longer, decreasing validation loss, epochs, error rate 01:06:15 - Final questions: Progressive resizing and reinitialization 01:08:00 - Final questions: Resolution independent models, trick to make TIMM resolution independent by changing positional encodings
The aim of this was a quick validation that it was worth the effort of looking into further, and verifying that it does indeed provide better test results. The setup is extremely straightforward:
Model: resnet18, not pretrained
Batch Size: 64
I then fit for 5 epochs and took that validation. It wound up being 18.09%. This serves as my baseline.
From there I changed the augmentation size of the final Resize transform (see snippet in the notebook for the caveat of doing so).
Jeremy if we want to do something like this to make progressive resize easier perhaps how the Pipeline's are created should make a deepcopy of each transform? (See nb for what I mean)
Actually, this is the perfect way to increase the progressive resize shape/size since it adjusts both train and val
For my test I changed the final resolution to 448 rather than 320x320, as this is typically what I’d perform for progressive resizing (2x the image size).
This new upscale got 18.6%! Which is an improvement! Phenomenal.
But, the final test was whether it was worth doing fit_one_cycle one last time for a single epoch. And the results might (or might not surprise you).
This final accuracy at the upscaled size was 19.11%, beating our only-upscale-on-inference by almost 2x when comparing it against our benchmark.
So, is it worth doing? I’m not entirely sure. Where someone could pick this up is trying to see what happens if you mimic Kaggle. E.g. We have a train/val/hidden test set, and performing inference on that hidden test set at the very end and comparing the three options performed here again.
It could be worth it if you don’t have a few spare minutes on your GPU quota, but otherwise wasn’t too too impressed with the results
This is also of course an n of 1, but the difference wasn’t high enough for me to think about trying across 5 runs and averaging, etc
(I of course invite all criticism and please try this yourself too! It’s a fun little exercise )
Just discovered a tiny bug in the fastkaggle library. I’m sure it is not a big deal. The setup_comp raises error if you use a different variable for holding the competition name than comp. See the following error:
The version on my instance is 2.6.3. As you can see in the gist the change to 448px crop size at test time drops the accuracy from 45% to 39%. Fitting one cycle gives an accuracy of 30%. So, the claim from the paper is not replicated in my instance of env.
But even if it were, we might have a few more issues:
TTA seems to fix the issue brought in the paper as the pipeline is the same then.
They seem to assume that the apparent pixel sizes of objects don’t vary much in the test set. This may be true for imagenet, but I doubt it is always the case. They make this assumption; otherwise, training a model to recognize objects in more pixel sizes should improve the generalization and test accuracy (as these sets do not have to have the same distributions).
They observe improvement when scaling test size to 1.3x of train size (from 224px to 288px), at 2x scale (448px) the performance drops again. The notebook uses the 2x scale.
No problem! thanks for noting the performance drop, which aligns with your behavior. There may have been some lag in some degree or some weird bug with the states that caused the notebook to get weird results (an issue on me, oops), don’t have the time to get around to test it today but will trust what you say and that my analysis is flawed!
Following up on the issues above. Training for longer (20 epochs) gives us a better model where we can observe the improvement in performance when changing to 288. (Fitting one epoch degraded the performance as the LR was too large.)
After lowering the LR and fitting for 5 more epochs on 224px we get better performance than previously after changing 288, switching that model to 288 gives again better performance.
TTA with 224 is 4x slower but gives almost the same performance than switching to 288px
TTA with 288 gives the best performance I’ve seen.
Here is the colab notebook with results:
I’ve tested the paper assumption on pre-trained models, if the model in most of the exp. it gives 10% - 15% improvement. But when a model is pretty good already it can be determinetal, so my guess is that if we get a more capable model the trick won’t help. Here is a summary of the results in excel. So I’m not sure It make sense to implement the full paper.