EfficientNet

Seb · June 1, 2019, 12:21pm

Multiple implementations listed here: https://paperswithcode.com/paper/efficientnet-rethinking-model-scaling-for

Edit: some have been linked to already, but this website will automatically add new implementations as they come up.

LessW2020 · June 1, 2019, 6:05pm

Thanks for this link @Seb!

So basically there are 3 PyTorch implementations. I’m going to try and review all 3 and leverage that to code up my own…and then try to see about wrappering it into FastAI v2 once I verify my own implementation.

The one that looks the cleanest to me is this one:

github.com

zsef123/EfficientNets-PyTorch/blob/master/effnet.py

import math

import torch
import torch.nn as nn


def conv_bn_act(in_, out_, kernel_size,
                stride=1, padding=0, groups=1, bias=True,
                eps=1e-3, momentum=0.01):
    return nn.Sequential(
        nn.Conv2d(in_, out_, kernel_size, stride=stride, padding=padding, groups=groups, bias=bias),
        nn.BatchNorm2d(out_, eps, momentum),
        Swish()
    )


class Swish(nn.Module):
    def forward(self, x):
        return x * torch.sigmoid(x)

This file has been truncated. show original

But I’m going to walk through all of them in more detail and see what differences, ultimately, exist.

Seb · June 1, 2019, 6:08pm

Sounds good.

The Pytorch projects seem like literal translations of the original ones more or less. I see a lot of potential for refactoring to make things more fastai-ish.

LessW2020 · June 1, 2019, 6:10pm

Thanks for this link @MicPie!

I have my github repro made but nothing is checked in yet b/c I’m still working on it https://github.com/lessw2020/EfficientNet-PyTorch/blob/master/README.md

Now that there’s 3 implementations, I’m trying to go through all of them now and hopefully leverage the best of each in terms of cleanliness of implementation, and once I have my own, I’ll test it out on MNIST for basic checking, and then try and rewrite similar to how XResNet was done for FastAI integration.
(*That’s a lot of work though, so I’d welcome any and all help!)

LessW2020 · June 1, 2019, 6:12pm

yes, exactly - these are all standalone projects with no integration…so hopefully we can build an improvement to it in that respect.
That said, I’m really happy to have these 3 implementations as the authors solved a couple translation from TF questions I had yesterday.

LessW2020 · June 2, 2019, 4:21am

They are a bit different - drop connect is different than dropout:
dropout is for the activations, and drop_connect is for the weights .

Here’s the code I just checked in for the drop_connect for my implementation:

class Drop_Connect(nn.Module):
"""create a tensor mask and apply to inputs, for removing drop_ratio % of weights"""
def __init__(self, drop_ratio=0):
    super().__init__()
    self.keep_percent = 1.0 - drop_ratio

def forward(self, x):
    if not self.training():
        return x
    
    batch_size = x.size(0)
    random_tensor = self.keep_percent
    random_tensor += torch.rand([batch_size, 1, 1, 1], dtype=x.dtype)
    binary_tensor = torch.floor(random_tensor)
    output = x / self.keep_percent * binary_tensor
    
    return output

Seb · June 2, 2019, 5:47pm

I’ve published my WIP here:

So far I have EfficientNet-B0 running on Imagewoof, though I haven’t spent much time checking that my work is an accurate replication.
EfficientNet-B0 does train faster than xresnet50, but is not as good after 80 epochs. Things should get more interesting with B3+ if that accuracy chart is accurate.

LessW2020 · June 2, 2019, 6:43pm

@Seb - thanks for the inital results! Can you see if using Swish which is their activation function in the paper matters?

i.e. you have:
act_fn = nn.ReLU(inplace=True)

I have:
act_fn = eu.Swish() #eu is my utility file import

class Swish(nn.Module):
def forward(self, x):
    x = x * torch.sigmoid(x)  #nn.functional.sigmoid is deprecated, use torch.sigmoid instead
    return x

Seb · June 2, 2019, 8:02pm

Nice catch! Swish does seem to do better than ReLU. Updated my repo. I probably missed other details…

Seb · June 2, 2019, 8:23pm

I found another mistake: batchnorm-momentum in Pytorch is 1 - batchnorm-momentum from Tensorflow…

Edit to add: interestingly results over 80 epochs are not as good with BN momentum =0.01 rather than 0.99

Seb · June 3, 2019, 1:49am

I got all models B0 to B7 implemented in my repo now, but I’m getting weird results so I probably got it wrong. Will look into it tomorrow.

One comment is that once you get past B3, image sizes force batch size to decrease, which slows down training. I guess we could still just increase image size progressively.

LessW2020 · June 3, 2019, 2:13am

Glad to hear the Swish change is helping. I’m going to test out FTSwishPlus() once I’m up and running.
I’m about one error away from having B0 up and running.

I’m going to try to compare your impl, mine, and the two/three others out there and hopefully pick up any errors and/or design issues.

Re: XResNet comparison - so XResNet50 is about the same as ResNet152…so a B1 should be just a bit better than an XResNet50 is that comparison all holds true, and a B0 should underperform.

More interesting of course is two things:
1 - a B4 or B5 vs XResNet50 and XResNet152…and of course comparing total parameters.
2 - Even if accuracy is the same, if EffNet is doing it with 1/5 the params, and training faster as well, then that’s still a better arch imo.

And, if XResNet outperforms then all the better for FastAI

illusionsofa · June 3, 2019, 4:00am

@Seb Would you current code be able to train on datasets other than imagewoof/imagenette by just substituting the ImageList with another?

I’m quite new to this so I’m sorry if I’m missing something obvious, but I’m currently getting RuntimeError: CUDA error: device-side assert triggered when I do that with my own data but it works on the imagewoof/imagenette ones just fine.

Seb · June 3, 2019, 12:16pm

Thanks for trying my code out! It should run on other datasets(although note I still have to confirm I built the models correctly).

My guess is you need to change c_out which is the number of classes your data set has. I haven’t created a parameter for that so you’ll need to change it directly in line 63 in train.py.

Otherwise, you’d need to get a more useful error message by doing the following

First thing is to try to run the code on CPU. CPU code has more checks so it will possibly return a better error message.
If the CPU code runs without error, then run the same thing with CUDA_LAUNCH_BLOCKING=1 to get a proper error message and stack trace.

illusionsofa · June 3, 2019, 12:53pm

Oh yes, changing the c_out fixed my problem, thank you for the suggestion!

Do you plan on making pretrained Imagenet models for each of the networks as well in the near future?

Seb · June 3, 2019, 1:08pm

Great!
I am not sure about pretrained models. Maybe we can figure out how to convert the weights from Tensorflow (or reuse the conversion done by other Pytorch implementations)

Current goal is to have efficientnet.py code be closer in style to xresnet.py and integrated to fastai so that we can more easily experiment with the model. I like Jeremy’s goal of having the whole model fit on one screen.

If you have a need for pretrained models, I recommend checking out other Pytorch repos such as this one

Seb · June 3, 2019, 5:17pm

Don’t rush using my repo; I just fixed a couple issues, namely squeeze-ex and drop-connect were not being used in the model…

Seb · June 3, 2019, 6:23pm

LessW2020:

They are a bit different - drop connect is different than dropout:
dropout is for the activations, and drop_connect is for the weights .

Here’s the code I just checked in for the drop_connect for my implementation:

class Drop_Connect(nn.Module):
"""create a tensor mask and apply to inputs, for removing drop_ratio % of weights"""
def __init__(self, drop_ratio=0):
    super().__init__()
    self.keep_percent = 1.0 - drop_ratio

def forward(self, x):
    if not self.training():
        return x
    
    batch_size = x.size(0)
    random_tensor = self.keep_percent
    random_tensor += torch.rand([batch_size, 1, 1, 1], dtype=x.dtype)
    binary_tensor = torch.floor(random_tensor)
    output = x / self.keep_percent * binary_tensor
    
    return output

IME this class implementation doesn’t play well with fp16 training… I had to go back to the function version.

LessW2020 · June 3, 2019, 8:28pm

Thanks for the update. I just checked your code, I see you are avoiding the self. usage /storage to avoid a device conflict?
Ok I’ll update mine to match.

Seb · June 3, 2019, 9:36pm

I did get a device conflict with

random_tensor += torch.rand([batch_size, 1, 1, 1], dtype=x.dtype,device=x.device)

And thus added device = x.device. That same line caused issues with dtype when using fp16.

I’m a bit unsure as to what’s going on. That function worked fine without making the device explicit in another Pytorch implementation. And it’s the same code that works with fp16 in a function but not in a module.

I didn’t purposefully avoid using self for device conflicts.