Big Transfer (BiT) paper from Google Brain

morgan · January 7, 2020, 5:14pm

Highlighting the Big Transfer (BiT) paper from google that @jeremy pointed out on twitter, achieves SOTA on a wide variety of downstream computer vision tasks with fairly standard fine-tuning.

BiT-Large pre-trained on JFT-300M dataset appears to be a fairly simple pre-training process and architecture, Interestingly (for me anyways) they ditched BatchNorm in favor of a combo of GroupNorm + Weight Standardisation in order to train sufficiently large batches.

Fine-tuning uses SGD, Mixup, fixed resolution scaling rules, random square crops and image flips (where appropriate).

They will be releasing code + weights + examples soon

Aside from ditching BatchNorm, nothing seems to be too revolutionary (or maybe I’m wrong), glad to see these fastai basics are still killing it

They mention that additional hyperparamter tuning could lead to better results again. With @LessW2020’s optimizer work and @Diganta’s Mish activation function the community here could probably push these results In guessing

Tweet: https://twitter.com/giffmana/status/1214240746095730688?s=20

Paper: https://arxiv.org/pdf/1912.11370.pdf

DrHB · January 7, 2020, 6:20pm

they also use no Dropout or Weight Decays…
But I agree with you, this boundary can be pushed further with new optimizers, training schedules, gradual unfreezing, gradual image size increase etc…=)

I think we as fastai community can perform systematic experiments and make nice repo on best practices how to do transfer learning and achieve state of the art results =)

morgan · January 8, 2020, 11:41am

Ah true true, thanks for highlighting

Sounds like a paper opportunity, similar to the Bag of Tricks paper!

DrHB · January 8, 2020, 3:07pm

I wonder what is the most systematic way to go about this ?

Maybe design all the transfer leaning experiments and choose few datasets (classification).
Create executable scripts
And out source to people.

Any more ideas ?

morgan · January 10, 2020, 4:18pm

All sounds reasonable to me.

Might be an idea to limit it to vision classification at first? How to break out architecture tweaks from training techniques (e.g. training schedule)?

DrHB · January 11, 2020, 2:43am

vision sounds good =)

Here is the list things we could try
just substituting last nn.Linear

-training with SDG (baseline)
-Adam One Cycle
-Mish + Adam, One Cycle 
-Mish + Radam (cos aneal)

Adding Fastai Tail (AdaptiveConcatPool2d, Flatten, bn, drop, act, nn.Linear etc)

trying unfreeze everything with following:

-training with SDG (baseline)
-Adam One Cycle
-Mish + Adam, One Cycle 
-Mish + Radam (cos aneal)

5 freeze + 15 unfreeze

-training with SDG (baseline)
-Adam One Cycle
-Mish + Adam, One Cycle 
-Mish + Radam (cos aneal)

5 freeze + 15 unfreeze( but different learning rates)

-training with SDG (baseline)
-Adam One Cycle
-Mish + Adam, One Cycle 
-Mish + Radam (cos aneal)

Dont forget, Mixup, Label smoothening .

Anything else to add or I am missing, we should add all possible combination and from there try to shorten this list.

MicPie · January 11, 2020, 11:45am

Also testing a tail/head with a GeM pooling layer?

The idea is based on Dmytro Mishkin:

ConcatPool is simple [avg, max] pool. It is the 2nd best option after GeM pooling in my experience.

And of course the optimizers from @LessW2020:
Ranger & newer optimizers

Peezy · January 11, 2020, 10:10pm

Hi, I’m new to CV but I think this would be really interesting. Please message me. I am happy to help anyway I can.

DrHB · January 13, 2020, 3:34pm

I am working my way thru new version of fastai after I am done, I will create repo with the table of possible transfer learning experiment =)

Diganta · January 13, 2020, 4:21pm

I’ll try to train BiT from scratch on ImageNet with Mish + AdamW as for starters.

morgan · January 14, 2020, 11:58am

Awesome, was out for a few days there, your list above sounds like a good place to start, thanks @DrHB! Yep its a good idea, would like to get myself up to speed on fastai2 too before kicking off new work.

DrHB · February 7, 2020, 3:59pm

Yo. Just an update =) Will start performing some experiments this weekends. So far I came up with this experiments, any suggestions ?
Screen Shot 2020-02-07 at 10.55.49 AM

I will limit on Imagewoof and Imagenette. Will perform 3 runs

I was planing to conduct all this experiments in total of 10 epoch. 3 with freeze and 7 with unfreeze.

morgan · February 9, 2020, 1:44pm

Hey, great start by the sounds of it! Sorry have been trying (unsucessfully) to compete in the Google Quest comp on Kaggle, will get to this tomorrow once it closes!

Caleb · April 15, 2020, 5:00pm

HI, are you guys still working on this? I’d love to do some work on this.

morgan · April 16, 2020, 11:29am

No I haven’t been, I got distracted by another kaggle competition

Caleb · April 25, 2020, 3:11pm

Hi, I’m in the same boat. Which one are you working on? I’ve been having fun with flowers.

giacomov · June 14, 2020, 1:25am

Looks like the weights and notebooks are out (including for pytorch): https://github.com/google-research/big_transfer