EfficientNet - training tips within fastai and best models

Hi all,
I started work on a new fine-grained classification project and was very surprised to see how fast EfficientNet (B3 and B4) jumped on the problem relative to having spent 2 days with ResNet50.

Thus, I wanted to start a thread to try and pool resources on best practices for training with EfficientNets including:
1 - Are people seeing much difference or ease of use between Luke Melas models vs rwightman models within fastai?

2 - Are you using differential learning and if so how did you split up the param groups?

3 - I’ve lost the heatmaps while running with EfficientNet - has anyone integrated that?

4 - Any experience with swapping in Mish instead of swish? @morgan I think has the most experience here but would be great to hear any updates and details as I’d prefer to use mish if possible.

I’ll be doing work with it next week and will try and update here with my own findings but appreciate if you can share your tips/tricks here so we can pool resources.



I was able to start doing a lot more work with EfficientNet and here’s a couple quick notes that may help.
1 - B7 is a beast and will suck out all your GPU memory lol. I dropped to B6 and that was better, B5 was even better so I’m just going to work with B6 or below at this point.

2 - File sizes for weights: I haven’t seen this info published but here’s what I was getting for weights - you can see the Huge weight disparity:
B0 16.8MB
B3 72.5
B4 282
B5 455
B6 653
…wait for it:
B7 1.02GB

I was just working with Melas implementation with AutoAug training. I’m going to use RWightmans with AdvProp next.

Best regards,


I was able to get heatmaps working for EfficientNets! I also improved the code a bit for the heatmaps after reviewing some other heatmap code.

Also, while the training files are 455MB for the B5 model, I’m happy to note that when you export for production the file size drops a bunch to 116MB.

I’ve found the Ranger optimizer to work well with EfficientNet and that’s what I’ve been using all week, but hoping to try out a few others soon including SLS and DeepMemory.


I think that’s because by default learn.save saves a copy of the optimizer while learn.export does not. If you set with_opt=False the saved size should be close to the exported size.

Google’s B5 Imagenet weights are 116MB.


You are right! I never would have thought that the optimizer would make that huge of a difference.

Anyway I tested to verify and with opt (default) = 455MB, with_opt=False is 114MB.

Thanks for posting this…that’s a useful tip!

I wanted to add this tip from @TomB here so we have aggregate EfficientNet knowledge…if you want to use differential learning, you can split the model with this command:

learn = learn.split([learn.model._conv_stem,learn.model._blocks,learn.model._conv_head])

He tested this on a B0, I’ll test tomorrow on B5 which is what I’m working with daily now but pretty sure it will work for all models.

1 Like

Hi can you please kindly share your solution for getting heatmaps working for EfficientNets?
Thanks a lot