EfficientNet

@mgloria @jeremy If you want a discussion, I’d raise an issue in the Google TPU repository https://github.com/tensorflow/tpu instead of Luke’s… that’s where the EfficientNet reference impl that Luke, myself, and others adhered to for our EfficientNet impl. I doubt anyone tried to muddle through that scaling impl from the paper. Someone often responds to issues there, incl questions about papers for the models within a few days to a week.

All that being said, it does seem to roughly make sense if you throw in some generous rounding to .1 decimal place on width/depth factor and round (or floor/ceil?) res to nearest res % 16 == 0 … as the models get larger it seems to diverge a bit with some likely hand nudging…

Working through a few, the compound scaling factor (phi) for B1 is approx .5, B2=1, B3=2, etc.
1.0 (B0 depth) * 1.2^2 = 1.44
1.0 (B0 width) * 1.1^2 = 1.21,
224 (B0 res) * 1.15^2 = 296

2 Likes

thanks a lot @rwightman for your reply. It was my first time trying going through paper and code so I find it very interesting to have this discussion.

What you say makes sense, I repeated all the calculations and these are the results:

I was surprised to see that in some cases, specially for the resolution, the difference is not just some minor rounding.

Tell me (a newbie), why the rounding constraint res % 16 == 0?

I added some info also regarding the computational cost which may also help others. If I understood the paper correctly:
alpha * beta^2 * gamma^2 =~ 2 (equation 3).
So any new phi, will increase the original number of FLOPS by 2**phi.

e.g. 64 times more flops for B7 (phi=6) in comparison to B0. Is this correct? Just want to make sure that I am completely on the same page.

I created a collab so that any interested and confused soul like me can follow along. :wink:

2 Likes

As most computer hardware is built from bits, organized into 8-bit bytes, organized into WORDS (int16), DWORDS (int32), etc, you’ll often encounter constraints related to various power of 2 numbers. Sometimes it’s just fetishism of sorts, developers learn to like the binary roundness of such numbers. In many cases there are real performance reasons, especially when you start talking about parallization, loop unrolling, buffer alignment, optimal mapping to hardware resources (often existing in multiples of such numbers). The non power of 2 remainder would often have to be handled with another pass, or a special block of code that will take more time or underutilize the hardware.

In this case there are heavily optimized (faster, less memory) variations of convolution operations that work if your tensors adhere to certain constraints, some dimensions like batch, width, height being divisible by 8 (or possibly higher pow2 numbers is often a criteria).

3 Likes

Another week and another set of EfficientNet weights released (AdvProp) with a B8 model specification… Google team has been busy!

The new weights were trained with an adversarial training technique, using adversarial examples as training data augmentation (https://arxiv.org/abs/1911.09665). It’s pretty neat, they leverage separate BN layers for normal and adversarial examples, to prevent adversarial examples messing up the BN and reducing the model performance on normal examples. Something I’ve encountered when fiddlign with adversarial training in the past… or pushing other data aug techniques too far.

Official weights released in the TPU repo: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet

I’ve added to my PyTorch ports and run through the tests: https://github.com/rwightman/gen-efficientnet-pytorch

8 Likes

Hi there @rwightman

Just discovered your geffnet library I’d love to experiment with your EfficientNet implementations in fast.ai (v2).
Is this possible yet? Too early for jumping into v2 with external stuff?

Would you have a notebook somewhere where I can get a grasp of how things work?

I already fail to create the cnn :man_shrugging:

Cheers,
C

@cwerner I can’t speak for him but I’ve been playing with v2 since the dev first started and you certainly can :slight_smile: my repo has a ton of tutorial notebooks on how to do various tasks. This one shows custom models (along with optimizers and activation functions)

https://github.com/muellerzr/Practical-Deep-Learning-for-Coders-2.0/blob/master/06_SOTA.ipynb

4 Likes

Neato!

Will check it our right away! Thanks a bunch!

This link is not working anymore :frowning:

Apologies, I wound up switching around the repos. Here it is for efficentnet and then for SOTA

4 Likes

Thank you! Do you know if there is any plans to include these models in fastai2? It seems that these models perform better than resnet.

1 Like

It’s 2020…
in case someone is looking to use efficient nets straight out of the box with FastAI V2
here it is Plant Pathology Kaggle
It’s nice to see fast ai whup some of the TPU based scores too… [granted GPU based training take much more time]
I am sure there can be improvements to this…
among other things, the oversamplingCallback wasn’t particularly effective…

3 Likes

Thanks

Does anyone know how much epoch of efficientnet(b0) training you need to rush top5 90%, 91%, 92%, and 93% on imagenet?
In TF code they train for 350 epochs. I can reach 90% in 50 epochs but do I really need 350epochs to reach 93% ?

Well, I think the bottleneck here are the inverted residual blocks. I don’t know for sure, but a few days back I trained a Tiny YOLOv2, only it had all the Conv2D’s replaced with depthwise and 1x1 convolutions. It had some 1M odd parameters, compared to the normal Tiny YOLOv2 which has over 4M parameters. But still it took roughly double the time to train.

Hi All,
I was wondering if it is possible to use EfficientNet like this

learn = cnn_learner(dls, efficientnet, metrics=accuracy) # resnet 34 or 18, also 50 works better

because I am having this.
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
in
----> 1 learn = cnn_learner(dls, efficientnet, metrics=accuracy) # resnet 34 or 18, also 50 works better

NameError: name 'efficientnet' is not defined
1 Like

It’s not. I would recommend using my port of timm: https://walkwithfastai.com/vision.external.timm

(EfficientNet is not in the fastai model zoo)

2 Likes

I will have a look, thanks @muellerzr much appreciated.

Hi there, it seems like I always get this error.

TypeError: forward() got an unexpected keyword argument 'pretrained'

I follow exactly according to @muellerzr’s tutorial. It seems that the newest version of fastai has this issue. (The tutorial fastai version: 2.0.14, my version: 2.2.2)

PS: i looked through the model and there’s no pretrained parameters at anywhere too.

1 Like
def create_timm_body(arch:str, pretrained=True, cut=None):
  model = create_model(arch, pretrained=pretrained)
  if cut is None:
    ll = list(enumerate(model.children()))
    cut = next(i for i,o in reversed(ll) if has_pool_type(o))
  if isinstance(cut, int): return nn.Sequential(*list(model.children())[:cut])
  elif callable(cut): return cut(model)
  else: raise NamedError("cut must be either integer or function")

^ The function from wwf tutorial.

from timm import create_model
body = create_timm_body('efficientnet_b0', pretrained=True)
model = nn.Sequential(body, create_head(1280, 10))
apply_init(model[1], nn.init.kaiming_normal_)
learn = cnn_learner(data, model, metrics=[error_rate,accuracy,F1Score(average='micro'),Precision(average='micro'),Recall(average='micro')],cbs=[WandbCallback(log_dataset=True, log_model=True),SaveModelCallback()])

You’re not using timm_learner. If you want to use timm you need to use the entire pipeline, as intended. You should do:

learn = timm_learner(data, 'efficientnet_b0', metrics=...)

Or, if you want to get more advanced you need to do Learner. (There is a reason I did not use cnn_learner at all)

With what you showed, you would use Learner() directly rather than cnn_learner (or even timm_learner if you were going that route)

1 Like