Big Transfer (BiT) paper from Google Brain

Highlighting the Big Transfer (BiT) paper from google that @jeremy pointed out on twitter, achieves SOTA on a wide variety of downstream computer vision tasks with fairly standard fine-tuning.

BiT-Large pre-trained on JFT-300M dataset appears to be a fairly simple pre-training process and architecture, Interestingly (for me anyways) they ditched BatchNorm in favor of a combo of GroupNorm + Weight Standardisation in order to train sufficiently large batches.

Fine-tuning uses SGD, Mixup, fixed resolution scaling rules, random square crops and image flips (where appropriate).

They will be releasing code + weights + examples soon

Aside from ditching BatchNorm, nothing seems to be too revolutionary (or maybe I’m wrong), glad to see these fastai basics are still killing it :smiley:

They mention that additional hyperparamter tuning could lead to better results again. With @LessW2020’s optimizer work and @Diganta’s Mish activation function the community here could probably push these results In guessing :wink:

Tweet: https://twitter.com/giffmana/status/1214240746095730688?s=20

Paper: https://arxiv.org/pdf/1912.11370.pdf

9 Likes

they also use no Dropout or Weight Decays…
But I agree with you, this boundary can be pushed further with new optimizers, training schedules, gradual unfreezing, gradual image size increase etc…=)

I think we as fastai community can perform systematic experiments and make nice repo on best practices how to do transfer learning and achieve state of the art results =)

2 Likes

Ah true true, thanks for highlighting

Sounds like a paper opportunity, similar to the Bag of Tricks paper!

1 Like

I wonder what is the most systematic way to go about this ?

  1. Maybe design all the transfer leaning experiments and choose few datasets (classification).

  2. Create executable scripts

  3. And out source to people.

Any more ideas ?

All sounds reasonable to me.

Might be an idea to limit it to vision classification at first? How to break out architecture tweaks from training techniques (e.g. training schedule)?

1 Like

vision sounds good =)

Here is the list things we could try
just substituting last nn.Linear

-training with SDG (baseline)
-Adam One Cycle
-Mish + Adam, One Cycle 
-Mish + Radam (cos aneal)

Adding Fastai Tail (AdaptiveConcatPool2d, Flatten, bn, drop, act, nn.Linear etc)

trying unfreeze everything with following:

-training with SDG (baseline)
-Adam One Cycle
-Mish + Adam, One Cycle 
-Mish + Radam (cos aneal)

5 freeze + 15 unfreeze

-training with SDG (baseline)
-Adam One Cycle
-Mish + Adam, One Cycle 
-Mish + Radam (cos aneal)

5 freeze + 15 unfreeze( but different learning rates)

-training with SDG (baseline)
-Adam One Cycle
-Mish + Adam, One Cycle 
-Mish + Radam (cos aneal)

Dont forget, Mixup, Label smoothening .

Anything else to add or I am missing, we should add all possible combination and from there try to shorten this list.

1 Like

Also testing a tail/head with a GeM pooling layer?

The idea is based on Dmytro Mishkin:

ConcatPool is simple [avg, max] pool. It is the 2nd best option after GeM pooling in my experience.

And of course the optimizers from @LessW2020:
Ranger & newer optimizers

1 Like

Hi, I’m new to CV but I think this would be really interesting. Please message me. I am happy to help anyway I can.

I am working my way thru new version of fastai after I am done, I will create repo with the table of possible transfer learning experiment =)

1 Like

I’ll try to train BiT from scratch on ImageNet with Mish + AdamW as for starters.

3 Likes

Awesome, was out for a few days there, your list above sounds like a good place to start, thanks @DrHB! Yep its a good idea, would like to get myself up to speed on fastai2 too before kicking off new work.

2 Likes

Yo. Just an update =) Will start performing some experiments this weekends. So far I came up with this experiments, any suggestions ?
Screen Shot 2020-02-07 at 10.55.49 AM

I will limit on Imagewoof and Imagenette. Will perform 3 runs

I was planing to conduct all this experiments in total of 10 epoch. 3 with freeze and 7 with unfreeze.

Hey, great start by the sounds of it! Sorry have been trying (unsucessfully) to compete in the Google Quest comp on Kaggle, will get to this tomorrow once it closes!

1 Like