Adding EfficientNet to fastai vision

In this ( paper published by Google, the authors proposed a new neural network architecture they call “EfficientNet”. EfficientNets are a family of image classification models, which achieve state-of-the-art accuracy, yet being an order-of-magnitude smaller and faster than previous models.

A pytorch implementation of EfficientNet can be found here: Through this, pytorch implementation, we can easily add EfficientNet to fastai.

From the pytorch implementation of EfficientNet:
“EfficientNet PyTorch is a PyTorch re-implementation of EfficientNet. It is consistent with the original TensorFlow implementation(, such that it is easy to load weights from a TensorFlow checkpoint. At the same time, we aim to make our PyTorch implementation as simple, flexible, and extensible as possible.”

I will make a pull request to Is this the right repo to make a pull request? In this request, I’ll include in the util and model script of the pytorch implementation. Is there anything else I have to do?


There’s a large discussion on efficienct net already here: EfficientNet

And how to use it


I believe the pytorch implementation on github came after all these discussions. And I just want to add this code to fast ai.

1 Like

Glad to see that!
I’ve installed the fastai dev version by pip install git+
and the Efficientnet pip install efficientnet-pytorch.

How can I use Efficientnet like
learn = cnn_learner(data, models.resnet34, metrics=error_rate)?

learn = cnn_learner(data, models.efficientnet.EfficientNetB1, metrics=error_rate)
doesn’t work?

1 Like

You need to use Learner(), not cnn_learner.


Thank you so much for your reply!

I tried learn = Learner(data, models.efficientnet.EfficientNetB1(), metrics=error_rate), it raised “NameError: name ‘data’ is not defined”.
While learn = Learner(data, models.efficientnet.EfficientNetB1, metrics=error_rate), it raised "AttributeError: ‘function’ object has no attribute ‘to’ ".

Seems silly questions… Could you tell me how to use it? Thanks again.

1 Like

The first is right, you need to pass the models in as a function. Try specifying data=data, arch= models.efficientnet.EfficientNetB1()

(And make sure data was defined beforehand?) :slight_smile:


Thank you. The code is as following:

from fastai import *
from import *
path = untar_data(URLs.MNIST_TINY)
data = ImageDataBunch.from_folder(path)
# learn = cnn_learner(data, models.resnet18, metrics=accuracy)  # works 

# from efficientnet_pytorch import EfficientNet
# model = EfficientNet.from_pretrained('efficientnet-b0', num_classes=2)
# model._fc = nn.Linear(in_features=1280, out_features=2, bias=True)
# learn = Learner(data, model, metrics=accuracy)                # works

learn = Learner(data, models.efficientnet.EfficientNetB1(), metrics=accuracy) # NameError: name 'data' is not defined

Don’t know what’s wrong…


I have the same problem it raised “NameError: name ‘data’ is not defined”. data is defined and does not make a difference if I use:
learn = Learner( data=data,arch=models.efficientnet.EfficientNetB5(),
or if I use
learn = Learner( data,arch=models.efficientnet.EfficientNetB5(),

Here are the steps I did to get efficient net working @gy0373 @agentili

!pip install efficientnet-pytorch

from fastai import *
from import *
from efficientnet_pytorch import EfficientNet

path = untar_data(URLs.PETS)
path_anno = path/'annotations'
path_img = path/'images'
fnames = get_image_files(path_img)
pat = r'/([^/]+)_\d+.jpg$'

data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=32

model = EfficientNet.from_name('efficientnet-b0')
model._fc = nn.Linear(1280, data.c)
learn = Learner(data, model)

I didn’t do this with MNIST as they’re set up for 3 channel inputs whereas MNIST is 2 channel (B/W). Sorry it took so long, did not have a chance to run this briefly until now


Thanks, it it works great!

1 Like

Thought is was unnecessary to open a new thread for a quick question.

I expect that different EfficientNets (0-7) have been pretrained targeting different resolutions. I’ve been unable to find that piece of information, though.

Or, perhaps they’ve all been trained targeting the usual imagenet 224/299… But then I think the whole thing about upscaling the resolution would be a lot less effective, if the bulk of the net would have been exposed to low-res features.

Anyone wanting to clarify? Thanks.


Hi Andrea,
You have the resolutions in Luke Melas code EffNet Implementation:

Coefficients: width,depth,res,dropout

‘efficientnet-b0’: (1.0, 1.0, 224, 0.2),

‘efficientnet-b1’: (1.0, 1.1, 240, 0.2),

‘efficientnet-b2’: (1.1, 1.2, 260, 0.3),

‘efficientnet-b3’: (1.2, 1.4, 300, 0.3),

‘efficientnet-b4’: (1.4, 1.8, 380, 0.4),

‘efficientnet-b5’: (1.6, 2.2, 456, 0.4),

‘efficientnet-b6’: (1.8, 2.6, 528, 0.5),

‘efficientnet-b7’: (2.0, 3.1, 600, 0.5)

But in my experience going with a B0 with a higher resolution also helps as in resnets


Thanks @mmauri, I searched everywhere (and the time devoted was not insubstantial) except in the most obvious place :slight_smile:


I’d like to check my results with you, fellas.

It’s strange, the whole point about EfficientNet is that it’s supposed to be, well… Efficient. It has a ludicrously low number of params w.r.t. the network capacity.

Now, I’m training with three tesla V100/32Gb in parallel. All the variants do train very slow (a lot slower than an unfrozen resnet). VRAM occupation is also monsterous. With b7, training with 600px imgs, I cannot raise the BS over 24, since vram occupation is around 90Gb.

Am I doing something wrong?



What pytorch implementation are you using?
Just for testing, have you tried to test the memory consumption if change the swish activation for relu?

1 Like

I’m using Pytorch 1.1.0.

I’ll swap swish and relu, and let you know, thanks. But from your comment, I understand that you are not experiencing such things as you use efficientnet, right?

1 Like

I meant if you were using rwightman or luke melas’ version.
I tried luke’s and b0 went a bit slower compared to resnet 34 but with much better loss, trying b4 and b5 was somewhat complicated because they were very slow on a t4 and with 512px the bs was very small and i gave up.
If the problem is the swish activation memory consumption you have 3 choices:

  1. Use the manual autograd version that was shown on kaggle (sorry i am on a mobile phone you can google it searching for lower efficient net memory consumption)

  2. swap it for swish cuda version made by @TomB

  3. Use the mish cuda version also from Tom

Best of luck

1 Like

Yes, I’ve also found EfficientNet is quite slow in PyTorch. I think that rwightman’s verison is a bit faster but not based on particularly extensive testing. The code looks like it’s written with performance in mind more. Especially with the padding stuff which is a bit weird in Luke’s. On that you might want to ensure your using the fixed image size versions there as they looked better (I think you just had to provide your image size).
I think it might be related to some issues with the depthwise convolutions in PyTorch, I’ve seen various things about performance issues there on the forums/code. You might also want to try PyTorch 1.2 if possible as might be some improvements there (or 1.3 but figured if on 1.1 for some reason then that might be an easier jump).

And yeah, a pretty sizeable memory drop using either the autograd or cuda versions of Swish/Mish (time is one epoch, b0, bs48, 256x256, rwightman’s, Swish).

          alloc MB  time
Original  6879      01:11
Autograd  5421      01:14
CUDA      5400      01:02

From this notebook which has the autograd version of swish and the little wrapper you need to use swish cuda with rwightman’s (check my fork for the little change to allow specifying an activation function).


Luke Melas. Should I try with rwightman’s? But from what you say below, I don’t thing it’s worth a try.

And this seems to confirm the Efnets are quite resource-demanding. I naively supposed the contrary.
I am trying to using the b7, anyhow. The maximum bs that fits into a V100/32g is 6-7, with 600px imgs. To reach 24, I had to allocate the whole bunch of 4 cards.

Thanks for your tips, I’ll certainly try and look into that stuff. It will be instructive, if nothing else.