[Project] Stanford-Cars with fastai v1

jianshen92 · September 2, 2019, 1:25pm

@morgan I have read about the links you have posted! Like what Jeremy says, the best practitioner are the one who are persistent.

How is the results with these new techniques?

morgan · September 2, 2019, 4:36pm

Hmmm the jury is still out. EfficientNet seems to be much more stable using ranger and rangerlars instead of RMSProp, but I’m not sure I can match the accuracy of the paper’s b3 model (93.somthing%).

I’m doing some 80epoch runs with a flat to cosine anneal lr and it seems to get slowly better, but still maxing out around 91.2% with ranger. I think I’ll have to go to 150epochs+ just in case its just a matter of brute force and time.

Just playing around with when the anneal starts as they seem to improve once the annealing period kicks in…also need to figure out what to drop the momentum to.

Once I have a baseline for ranger I’ll try rangerlars, and then Mish

This is my latest run:

jianshen92 · September 3, 2019, 6:27am

Why did you choose such scheduling is it suggested by the paper?

muellerzr · September 3, 2019, 6:35am

@jianshen92 We found in the imagenette/woof experiments that this form of scheduling showed the best results. One cycle was blowing everything up too quickly, and Grankin created this flat cosine annealing function which saw a dramatic increase in accuracy.

jianshen92 · September 3, 2019, 10:21am

@muellerzr I found out the thread where you guys talked about it, great stuff! I guess OneCycle will be suitable with only vanilla Adam for now. Might be completely replaced when newer optimiser becomes the standard i guess?

morgan · September 3, 2019, 3:22pm

Exactly as @muellerzr said Quick (dumb?) question for you both @muellerzr, @jianshen92 , when you are training are you using the fulling training set for training, with the test set as validation? Or are you splitting your train set into train + validation, keeping the test set only for a final evaluation after training is complete?

muellerzr · September 3, 2019, 3:29pm

@morgan when we’re running the imagenette/woof tests the test is the validation set for us. (It’s how Jeremy set it up in the example notebook)

jianshen92 · September 3, 2019, 4:46pm

@morgan I asked the exact same question to myself. I think for research purpose it is okay to use test set as validation set. For the stanford car dataset in the competition I entered, i thought it would be “cheating” to use the test set as validation, although not specified.

Is the fit_fc function new in the library? I can’t seem to find it in the current version (1.0.57) library that I am using.

muellerzr · September 3, 2019, 4:47pm

@jianshen92 run !pip install git+https://github.com/fastai/fastai.git to grab the most recent version to grab the absolute newest version to use it

morgan · September 3, 2019, 6:10pm

Thanks both, I had been splitting the train set, but I think I’ll switch to using the test set for validation. I copied it from that crazy thread, but great that its being pushed to fastai, nice!

jianshen92 · September 3, 2019, 6:39pm

I think if you want to compare the performance with other researcher (outside of fast.ai), it would be more accurate with an independent test set that is not used to benchmark your training. Being said im not sure how it is done when researcher report their results for benchmark dataset (imagenet etc.). @muellerzr do you have any insight for this?

muellerzr · September 3, 2019, 6:43pm

Generally how I do it is I use the labeled test set ‘trick’ that I found and I report two scores. A validation accuracy and a test set accuracy. If you do a search for labeled test sets on the forum and filter to responses from me you should be able to find the source code for my technique

morgan · September 3, 2019, 7:48pm

Thanks @muellerzr, nice tick, posting one of your answers here for future reference:

morgan · September 3, 2019, 7:50pm

I had been wondering the same @jianshen92, I don’t think I recall reading a paper where they specify whether or not they used the test set as validation. So I never new if it was just taken as a given or not…

muellerzr · September 3, 2019, 7:58pm

For another example of them doing that, look at IMDB and how we train it. Jeremy does the same thing

morgan · September 4, 2019, 6:42pm

Positive signs with Ranger + Mish for EfficientNet-b3, 1-run test set accuracy of 93.9% for Stanford cars with EfficientNet-b3 after 40e. Their paper quoted 93.6% for b3. Note I’m training on the full training set here, using the test set for validation.

I didn’t play around with the hyperparameters at all, just took what seemed to work well for Ranger:

40 epoch
lr=15e-4
start_pct=0.10
wd=1e-3,

Will kick off 4 additional runs so I can get a 5 run average, but its slow going, 2h20m per run

morgan · September 7, 2019, 6:10pm

Ok, finally got through running the model 5 times!

TL;DR

Achieved 93.8% 5-run, 40epoch, mean test set accuracy on Stanford Cars using Mish EfficientNet-b3 + Ranger
Beat the EfficientNet paper EfficientNet-b3 result by 0.2%
EfficientNet author’s best result using b3 was 93.6%, their best EfficientNet result was 94.8% (current SOTA) with EfficientNet-b7
Used MEfficientNet-b3, created by swapping the Squish activation function for the Mish activation function
Used the Ranger optimisation function (a combination of RAdam and Lookahead) and trained with FlatCosAnnealScheduler
EfficientNet-b3 with Ranger but without Mish was giving test set accuracy around 93.4% (-0.4%) but was still much more stable to train than my efforts to train EfficientNet with RMSProp (which was used in the original paper)

Quick Medium post here, my first post, feedback welcome!

Code in my github here

Mean accuracy and standard deviation:

meffnetb3_acc_std_dev

Validation set (=test set) accuracy, last 10 epochs:

Credits:

Ranger - @lessw2020
- Lookahead paper: Lookahead Optimizer: k steps forward, 1 step back
- RAdam paper: On the Variance of the Adaptive Learning Rate and Beyond, RAdam
- @lessw2020 Ranger implementation https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer/blob/master/ranger.py
- version 9.3.19 used
Mish @digantamisra98
- Paper: Mish: A Self Regularized Non-Monotonic Neural Activation Function
- Mish Repo: https://github.com/digantamisra98/Mish
- Mish blog: https://medium.com/@lessw/meet-mish-new-state-of-the-art-ai-activation-function-the-successor-to-relu-846a6d93471f
- Mish code implementation - @lessw2020 - https://github.com/lessw2020/mish/blob/master/mish.py
EfficientNet - @lukemelas
- Efficient Pytorch implementation that I swapped in Mish for: https://github.com/lukemelas/EfficientNet-PyTorch
FlatCosAnnealScheduler - @muellerzr
- Code taken from fastai thread below, being added to the fastai rep atm
Inspirational fastai thread, credit to all the contributors here

Training Params used:

40 epoch
lr = 15e-4
start_pct = 0.10
wd = 1e-3
bn_wd=False
true_wd=True

Default Ranger params were used :

alpha=0.5
k=6
N_sma_threshhold=5
betas=(.95,0.999)
eps=1e-5

Augmentations used:

Image size : 299 x 299
Standard Fastai transforms from get_transforms() :
- do_flip = True, max_rotate = 10.0, max_zoom = 1.1, max_lighting = 0.2, max_warp = 0.2, p_affine: float = 0.75, p_lighting = 0.75
ResizeMethod.SQUISH , which I found worked quite well from testing with ResNet152

Training Notes

Unlike testing done on the fastiai forums with XResNet and the Imagewoof dataset, this setup performed better with a shorter amount of time with a flat lr, followed by a longer cosine anneal.
I used the full test set as the validation set, similar to the Imagewoof thread in the fastai thread linked above
I manually restarted the gpu kernel and changed the run count as weights seemed to be being saved between runs. This persisted even when using learn.purge() and learn.destroy(). There had been a mention on the forums that the lookahead element of the Ranger implementation might have been responsible, but the problem persisted even after using version 9.3.19 which was supposed to address the issue.
Ran on a Paperspace P4000 machine

invisprints · September 8, 2019, 3:24am

Hello, in you code I didn’t find any information about how to create EfficientNet with Mish, could you please give me more details about it? Thank you!

morgan · September 8, 2019, 7:43am

Hello, in you code I didn’t find any information about how to create EfficientNet with Mish, could you please give me more details about it? Thank you!

Oh yep of course, I just replaced the relu_fn in the model.py file in the EfficientNet_PyTorch with the below:

def mish_fn(x):
    return x *(torch.tanh(F.softplus(x)))

Easy!

ttsantos · September 11, 2019, 9:45am

For anyone that wants to go for more than b3, I found out that layers for others are:
b4 - 1792
b5 - 2048
b7 - 2560

If you want to change on model._fc