EfficientNet

etremblay · May 30, 2019, 1:39pm

This new paper from Google seems really interesting in terms of performance vs # of parameters for CNNs. They achieve that by basically balancing the width, depth and size of the input image of the CNN while scaling it.

The performance difference seems so big that this would seem something interesting to integrate in fastai eventually.

Jeremy focus a lot on super-convergence in his courses (with good reason) and this seems totally inline with this philosophy of faster training, better performance.

rohit_gr · May 30, 2019, 3:25pm

And one more by IBM research https://arxiv.org/pdf/1905.09788.pdf which uses Multi-Sample Dropout for better and faster generalization with no increase in computation.

LessW2020 · May 30, 2019, 4:08pm

They’ve released the source for EfficientNet:

github.com

tensorflow/tpu/blob/master/models/official/efficientnet/efficientnet_model.py

# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains definitions for EfficientNet model.

[1] Mingxing Tan, Quoc V. Le
  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.
  ICML'19, https://arxiv.org/abs/1905.11946
"""

This file has been truncated. show original

Of course it’s in TensorFlow but looks like it’s very possible to readily move it over to PyTorch and put into something similar to how the XResNet model is built (templatized).

I’d really like to get this up and running so we can test it out vs XResNet/ResNet on ImageNette.

Seb · May 30, 2019, 7:05pm

Below is the architecture. A quick search indicates MBConv is just a variant on a Resblock. (https://towardsdatascience.com/mobilenetv2-inverted-residuals-and-linear-bottlenecks-8a4362f4ffd5)

Architecture:

LessW2020 · May 30, 2019, 8:47pm

Thanks for the link. I started coding it - they do add a width parameter to widen it as it goes deeper.
I’m pretty excited to try and get this up and running, ideally within the FastAI framework.

Seb · May 30, 2019, 8:52pm

I still haven’t looked at this in detail.
But I’m assuming we can adapt the resblock from here: https://github.com/fastai/fastai/blob/master/fastai/vision/models/xresnet.py

Also fastai has an implementation of Wideresnet that could bring inspiration:https://github.com/fastai/fastai/blob/master/fastai/vision/models/wrn.py#L38

LessW2020 · May 30, 2019, 8:54pm

Yes, exactly what I want to leverage (XResNet). There’s a couple things in the TF code I don’t understand so I’m tracking those down but otherwise I think we can get this running soon.
Let me know if you review in more detail as the more input/help the better imo

LessW2020 · May 30, 2019, 8:55pm

I posted a short article on the paper on Medium to try and provide a summary of their findings (super summary = scale your architecture in 3d - depth and width and resolution, only going one dimension quickly saturates out).

Seb · May 30, 2019, 9:01pm

MBConv isn’t new, there must be some Pytorch implementations out there, like:

github.com

snakers4/mnasnet-pytorch/blob/master/src/models/mnasnet.py

import os
import sys
import torch
import torch.nn as nn
from torch.nn import init
import torch.utils.model_zoo as model_zoo

# default_activation = nn.ReLU6
default_activation = nn.ReLU
debug_global = False

__all__ = ['mnasnet', 'MNasNet']

pretrained_settings = {
    'mnasnet': {
        'imagenet': {
            'url': '',
            'input_space': 'RGB',
            'input_size': [3, 224, 224],
            'input_range': [0, 1],

This file has been truncated. show original

Seb · May 30, 2019, 9:07pm

Within the EfficientNet code, they refer to this paper where I assume the MBconv comes from: https://arxiv.org/abs/1807.11626

LessW2020 · May 30, 2019, 9:12pm

Thanks for the links!
I’ve just been doing a line by line translation into PyTorch. Most of it is just changing params and PyTorch calls, so it’s not really conceptually hard but a bit tedious.
My plan is to convert it directly, verify and then try and try and match it to like how XResNet was done.

Seb · May 30, 2019, 9:27pm

I’ll see if I can build it from the concepts rather than the code, maybe we’ll meet halfway through.

Seb · May 30, 2019, 10:30pm

Edit: my understanding of squeeze excitation:

Start with x: (N,C,H,W)
Do average pooling, 1 value per channel: (N,C) or (N,C,1)
Squeeze with a conv 1*1: (N,C’) (with bias it seems)
ReLU
Unsqueeze: (N,C)
Sigmoid

Then multiply result by original x (broadcasting 1 value per channel)

LessW2020 · May 31, 2019, 1:24am

looks right to me - here’s some code for it:

def se_block(in_block, ch, ratio=16):
x = GlobalAveragePooling2D()(in_block)
x = Dense(ch//ratio, activation=‘relu’)(x)
x = Dense(ch, activation=‘sigmoid’)(x)
return multiply()([in_block, x])

and the TF code from the ExciteNet:
def _call_se(self, input_tensor):
“”“Call Squeeze and Excitation layer.
Args:
input_tensor: Tensor, a single input tensor for Squeeze/Excitation layer.
Returns:
A output tensor, which should have the same shape as input.
“””
se_tensor = tf.reduce_mean(input_tensor, self._spatial_dims, keepdims=True)
se_tensor = self._se_expand(act_fn(self._se_reduce(se_tensor)))
#tf.logging.info(‘Built Squeeze and Excitation with tensor shape: %s’ %
# (se_tensor.shape))
return F.sigmoid(se_tensor) * input_tensor

Seb · May 31, 2019, 1:42am

Do I understand right that they use a skip connection only if id_skip is true and strides=1 and channels stay the same?

Seb · May 31, 2019, 1:55am

Also wondering about drop connect, could we use dropout instead?

Seb · May 31, 2019, 9:25pm

I find some of the parameters could be simplified a bit:

stride could have 1 value instead of 2
se has the same value through the network
id skip is not used?

ilovescience · June 1, 2019, 12:09am

This repository has an implementation in PyTorch

LessW2020 · June 1, 2019, 3:26am

Fantastic, thanks @ilovescience a ton for the link!

I’m going to try and extricate the EfficientNet out so we have a pure ENet codebase but looks like he’s already solved the TF issues that I wasn’t sure of how to translate inito PyTorch…so this makes it 10x easier now.

@Seb , here’s his implementation on the SqueezeExcite portion for reference:

class SqueezeExcite(nn.Module):
def __init__(self, in_chs, reduce_chs=None, act_fn=F.relu, gate_fn=torch.sigmoid):
    super(SqueezeExcite, self).__init__()
    self.act_fn = act_fn
    self.gate_fn = gate_fn
    reduced_chs = reduce_chs or in_chs
    self.conv_reduce = nn.Conv2d(in_chs, reduced_chs, 1, bias=True)
    self.conv_expand = nn.Conv2d(reduced_chs, in_chs, 1, bias=True)

def forward(self, x):
    # NOTE adaptiveavgpool can be used here, but seems to cause issues with NVIDIA AMP performance
    x_se = x.view(x.size(0), x.size(1), -1).mean(-1).view(x.size(0), x.size(1), 1, 1)
    x_se = self.conv_reduce(x_se)
    x_se = self.act_fn(x_se)
    x_se = self.conv_expand(x_se)
    x = self.gate_fn(x_se) * x
    return x

MicPie · June 1, 2019, 7:07am

Here is a repo with a pytorch implementation:

(Found via the reddit paper thread.)

Do you have your fastai implementation in a online repo?