Ok, finally got through running the model 5 times!
TL;DR
- Achieved 93.8% 5-run, 40epoch, mean test set accuracy on Stanford Cars using Mish EfficientNet-b3 + Ranger
- Beat the EfficientNet paper EfficientNet-b3 result by 0.2%
- EfficientNet author’s best result using b3 was 93.6%, their best EfficientNet result was 94.8% (current SOTA) with EfficientNet-b7
- Used MEfficientNet-b3, created by swapping the Squish activation function for the Mish activation function
- Used the Ranger optimisation function (a combination of RAdam and Lookahead) and trained with FlatCosAnnealScheduler
- EfficientNet-b3 with Ranger but without Mish was giving test set accuracy around 93.4% (-0.4%) but was still much more stable to train than my efforts to train EfficientNet with RMSProp (which was used in the original paper)
Quick Medium post here, my first post, feedback welcome!
Mean accuracy and standard deviation:
Validation set (=test set) accuracy, last 10 epochs:
Credits:
- Ranger - @lessw2020
- Lookahead paper: Lookahead Optimizer: k steps forward, 1 step back
- RAdam paper: On the Variance of the Adaptive Learning Rate and Beyond, RAdam
- @lessw2020 Ranger implementation https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer/blob/master/ranger.py
- version 9.3.19 used
- Mish @digantamisra98
- Paper: Mish: A Self Regularized Non-Monotonic Neural Activation Function
- Mish Repo: https://github.com/digantamisra98/Mish
- Mish blog: https://medium.com/@lessw/meet-mish-new-state-of-the-art-ai-activation-function-the-successor-to-relu-846a6d93471f
- Mish code implementation - @lessw2020 - https://github.com/lessw2020/mish/blob/master/mish.py
- EfficientNet - @lukemelas
- Efficient Pytorch implementation that I swapped in Mish for: https://github.com/lukemelas/EfficientNet-PyTorch
- FlatCosAnnealScheduler - @muellerzr
- Code taken from fastai thread below, being added to the fastai rep atm
- Inspirational fastai thread, credit to all the contributors here
Training Params used:
- 40 epoch
- lr = 15e-4
- start_pct = 0.10
- wd = 1e-3
- bn_wd=False
- true_wd=True
Default Ranger params were used :
- alpha=0.5
- k=6
- N_sma_threshhold=5
- betas=(.95,0.999)
- eps=1e-5
Augmentations used:
- Image size : 299 x 299
- Standard Fastai transforms from get_transforms() :
- do_flip = True, max_rotate = 10.0, max_zoom = 1.1, max_lighting = 0.2, max_warp = 0.2, p_affine: float = 0.75, p_lighting = 0.75
- ResizeMethod.SQUISH , which I found worked quite well from testing with ResNet152
Training Notes
- Unlike testing done on the fastiai forums with XResNet and the Imagewoof dataset, this setup performed better with a shorter amount of time with a flat lr, followed by a longer cosine anneal.
- I used the full test set as the validation set, similar to the Imagewoof thread in the fastai thread linked above
- I manually restarted the gpu kernel and changed the run count as weights seemed to be being saved between runs. This persisted even when using learn.purge() and learn.destroy(). There had been a mention on the forums that the lookahead element of the Ranger implementation might have been responsible, but the problem persisted even after using version 9.3.19 which was supposed to address the issue.
- Ran on a Paperspace P4000 machine