EfficientNet

This new paper from Google seems really interesting in terms of performance vs # of parameters for CNNs. They achieve that by basically balancing the width, depth and size of the input image of the CNN while scaling it.

The performance difference seems so big that this would seem something interesting to integrate in fastai eventually.

Jeremy focus a lot on super-convergence in his courses (with good reason) and this seems totally inline with this philosophy of faster training, better performance.

22 Likes

And one more by IBM research https://arxiv.org/pdf/1905.09788.pdf which uses Multi-Sample Dropout for better and faster generalization with no increase in computation.

3 Likes

Theyā€™ve released the source for EfficientNet:

Of course itā€™s in TensorFlow but looks like itā€™s very possible to readily move it over to PyTorch and put into something similar to how the XResNet model is built (templatized).

Iā€™d really like to get this up and running so we can test it out vs XResNet/ResNet on ImageNette.

Below is the architecture. A quick search indicates MBConv is just a variant on a Resblock. (https://towardsdatascience.com/mobilenetv2-inverted-residuals-and-linear-bottlenecks-8a4362f4ffd5)

Architecture:

6 Likes

Thanks for the link. I started coding it - they do add a width parameter to widen it as it goes deeper.
Iā€™m pretty excited to try and get this up and running, ideally within the FastAI framework.

3 Likes

I still havenā€™t looked at this in detail.
But Iā€™m assuming we can adapt the resblock from here: https://github.com/fastai/fastai/blob/master/fastai/vision/models/xresnet.py

Also fastai has an implementation of Wideresnet that could bring inspiration:https://github.com/fastai/fastai/blob/master/fastai/vision/models/wrn.py#L38

1 Like

Yes, exactly what I want to leverage (XResNet). Thereā€™s a couple things in the TF code I donā€™t understand so Iā€™m tracking those down but otherwise I think we can get this running soon.
Let me know if you review in more detail as the more input/help the better imo :slight_smile:

I posted a short article on the paper on Medium to try and provide a summary of their findings (super summary = scale your architecture in 3d - depth and width and resolution, only going one dimension quickly saturates out).

6 Likes

MBConv isnā€™t new, there must be some Pytorch implementations out there, like:

1 Like

Within the EfficientNet code, they refer to this paper where I assume the MBconv comes from: https://arxiv.org/abs/1807.11626

1 Like

Thanks for the links!
Iā€™ve just been doing a line by line translation into PyTorch. Most of it is just changing params and PyTorch calls, so itā€™s not really conceptually hard but a bit tedious.
My plan is to convert it directly, verify and then try and try and match it to like how XResNet was done.

1 Like

Iā€™ll see if I can build it from the concepts rather than the code, maybe weā€™ll meet halfway through.

2 Likes

Edit: my understanding of squeeze excitation:

Start with x: (N,C,H,W)
Do average pooling, 1 value per channel: (N,C) or (N,C,1)
Squeeze with a conv 1*1: (N,Cā€™) (with bias it seems)
ReLU
Unsqueeze: (N,C)
Sigmoid

Then multiply result by original x (broadcasting 1 value per channel)

looks right to me - hereā€™s some code for it:

def se_block(in_block, ch, ratio=16):
x = GlobalAveragePooling2D()(in_block)
x = Dense(ch//ratio, activation=ā€˜reluā€™)(x)
x = Dense(ch, activation=ā€˜sigmoidā€™)(x)
return multiply()([in_block, x])

and the TF code from the ExciteNet:
def _call_se(self, input_tensor):
ā€œā€ā€œCall Squeeze and Excitation layer.
Args:
input_tensor: Tensor, a single input tensor for Squeeze/Excitation layer.
Returns:
A output tensor, which should have the same shape as input.
ā€œā€ā€
se_tensor = tf.reduce_mean(input_tensor, self._spatial_dims, keepdims=True)
se_tensor = self._se_expand(act_fn(self._se_reduce(se_tensor)))
#tf.logging.info(ā€˜Built Squeeze and Excitation with tensor shape: %sā€™ %
# (se_tensor.shape))
return F.sigmoid(se_tensor) * input_tensor

1 Like

Do I understand right that they use a skip connection only if id_skip is true and strides=1 and channels stay the same?

Also wondering about drop connect, could we use dropout instead?

I find some of the parameters could be simplified a bit:

  • stride could have 1 value instead of 2
  • se has the same value through the network
  • id skip is not used?

This repository has an implementation in PyTorch

5 Likes

Fantastic, thanks @ilovescience a ton for the link!

Iā€™m going to try and extricate the EfficientNet out so we have a pure ENet codebase but looks like heā€™s already solved the TF issues that I wasnā€™t sure of how to translate inito PyTorchā€¦so this makes it 10x easier now.

@Seb , hereā€™s his implementation on the SqueezeExcite portion for reference:

class SqueezeExcite(nn.Module):
def __init__(self, in_chs, reduce_chs=None, act_fn=F.relu, gate_fn=torch.sigmoid):
    super(SqueezeExcite, self).__init__()
    self.act_fn = act_fn
    self.gate_fn = gate_fn
    reduced_chs = reduce_chs or in_chs
    self.conv_reduce = nn.Conv2d(in_chs, reduced_chs, 1, bias=True)
    self.conv_expand = nn.Conv2d(reduced_chs, in_chs, 1, bias=True)

def forward(self, x):
    # NOTE adaptiveavgpool can be used here, but seems to cause issues with NVIDIA AMP performance
    x_se = x.view(x.size(0), x.size(1), -1).mean(-1).view(x.size(0), x.size(1), 1, 1)
    x_se = self.conv_reduce(x_se)
    x_se = self.act_fn(x_se)
    x_se = self.conv_expand(x_se)
    x = self.gate_fn(x_se) * x
    return x

Here is a repo with a pytorch implementation:


(Found via the reddit paper thread.)

Do you have your fastai implementation in a online repo?

9 Likes