EfficientNet

Multiple implementations listed here: https://paperswithcode.com/paper/efficientnet-rethinking-model-scaling-for

Edit: some have been linked to already, but this website will automatically add new implementations as they come up.

2 Likes

Thanks for this link @Seb!

So basically there are 3 PyTorch implementations. Iā€™m going to try and review all 3 and leverage that to code up my ownā€¦and then try to see about wrappering it into FastAI v2 once I verify my own implementation.

The one that looks the cleanest to me is this one:

But Iā€™m going to walk through all of them in more detail and see what differences, ultimately, exist.

1 Like

Sounds good.

The Pytorch projects seem like literal translations of the original ones more or less. I see a lot of potential for refactoring to make things more fastai-ish.

Thanks for this link @MicPie!

I have my github repro made but nothing is checked in yet b/c Iā€™m still working on it :slight_smile:https://github.com/lessw2020/EfficientNet-PyTorch/blob/master/README.md

Now that thereā€™s 3 implementations, Iā€™m trying to go through all of them now and hopefully leverage the best of each in terms of cleanliness of implementation, and once I have my own, Iā€™ll test it out on MNIST for basic checking, and then try and rewrite similar to how XResNet was done for FastAI integration.
(*Thatā€™s a lot of work though, so Iā€™d welcome any and all help!)

yes, exactly - these are all standalone projects with no integrationā€¦so hopefully we can build an improvement to it in that respect.
That said, Iā€™m really happy to have these 3 implementations as the authors solved a couple translation from TF questions I had yesterday.

They are a bit different - drop connect is different than dropout:
dropout is for the activations, and drop_connect is for the weights .

Hereā€™s the code I just checked in for the drop_connect for my implementation:

class Drop_Connect(nn.Module):
"""create a tensor mask and apply to inputs, for removing drop_ratio % of weights"""
def __init__(self, drop_ratio=0):
    super().__init__()
    self.keep_percent = 1.0 - drop_ratio

def forward(self, x):
    if not self.training():
        return x
    
    batch_size = x.size(0)
    random_tensor = self.keep_percent
    random_tensor += torch.rand([batch_size, 1, 1, 1], dtype=x.dtype)
    binary_tensor = torch.floor(random_tensor)
    output = x / self.keep_percent * binary_tensor
    
    return output
2 Likes

Iā€™ve published my WIP here:

So far I have EfficientNet-B0 running on Imagewoof, though I havenā€™t spent much time checking that my work is an accurate replication.
EfficientNet-B0 does train faster than xresnet50, but is not as good after 80 epochs. Things should get more interesting with B3+ if that accuracy chart is accurate.

7 Likes

@Seb - thanks for the inital results! Can you see if using Swish which is their activation function in the paper matters?

i.e. you have:
act_fn = nn.ReLU(inplace=True)

I have:
act_fn = eu.Swish() #eu is my utility file import

class Swish(nn.Module):
def forward(self, x):
    x = x * torch.sigmoid(x)  #nn.functional.sigmoid is deprecated, use torch.sigmoid instead
    return x

Nice catch! Swish does seem to do better than ReLU. Updated my repo. I probably missed other detailsā€¦

I found another mistake: batchnorm-momentum in Pytorch is 1 - batchnorm-momentum from Tensorflowā€¦

Edit to add: interestingly results over 80 epochs are not as good with BN momentum =0.01 rather than 0.99

I got all models B0 to B7 implemented in my repo now, but Iā€™m getting weird results so I probably got it wrong. Will look into it tomorrow.

One comment is that once you get past B3, image sizes force batch size to decrease, which slows down training. I guess we could still just increase image size progressively.

Glad to hear the Swish change is helping. Iā€™m going to test out FTSwishPlus() once Iā€™m up and running.
Iā€™m about one error away from having B0 up and running.

Iā€™m going to try to compare your impl, mine, and the two/three others out there and hopefully pick up any errors and/or design issues.

Re: XResNet comparison - so XResNet50 is about the same as ResNet152ā€¦so a B1 should be just a bit better than an XResNet50 is that comparison all holds true, and a B0 should underperform.

More interesting of course is two things:
1 - a B4 or B5 vs XResNet50 and XResNet152ā€¦and of course comparing total parameters.
2 - Even if accuracy is the same, if EffNet is doing it with 1/5 the params, and training faster as well, then thatā€™s still a better arch imo.

And, if XResNet outperforms then all the better for FastAI :slight_smile:

1 Like

@Seb Would you current code be able to train on datasets other than imagewoof/imagenette by just substituting the ImageList with another?

Iā€™m quite new to this so Iā€™m sorry if Iā€™m missing something obvious, but Iā€™m currently getting RuntimeError: CUDA error: device-side assert triggered when I do that with my own data but it works on the imagewoof/imagenette ones just fine.

1 Like

Thanks for trying my code out! It should run on other datasets(although note I still have to confirm I built the models correctly).

My guess is you need to change c_out which is the number of classes your data set has. I havenā€™t created a parameter for that so youā€™ll need to change it directly in line 63 in train.py.

Otherwise, youā€™d need to get a more useful error message by doing the following

First thing is to try to run the code on CPU. CPU code has more checks so it will possibly return a better error message.
If the CPU code runs without error, then run the same thing with CUDA_LAUNCH_BLOCKING=1 to get a proper error message and stack trace.

1 Like

Oh yes, changing the c_out fixed my problem, thank you for the suggestion!

Do you plan on making pretrained Imagenet models for each of the networks as well in the near future?

1 Like

Great!
I am not sure about pretrained models. Maybe we can figure out how to convert the weights from Tensorflow (or reuse the conversion done by other Pytorch implementations)

Current goal is to have efficientnet.py code be closer in style to xresnet.py and integrated to fastai so that we can more easily experiment with the model. I like Jeremyā€™s goal of having the whole model fit on one screen.

If you have a need for pretrained models, I recommend checking out other Pytorch repos such as this one

2 Likes

Donā€™t rush using my repo; I just fixed a couple issues, namely squeeze-ex and drop-connect were not being used in the modelā€¦

IME this class implementation doesnā€™t play well with fp16 trainingā€¦ I had to go back to the function version.

Thanks for the update. I just checked your code, I see you are avoiding the self. usage /storage to avoid a device conflict?
Ok Iā€™ll update mine to match.

I did get a device conflict with

random_tensor += torch.rand([batch_size, 1, 1, 1], dtype=x.dtype,device=x.device)

And thus added device = x.device. That same line caused issues with dtype when using fp16.

Iā€™m a bit unsure as to whatā€™s going on. That function worked fine without making the device explicit in another Pytorch implementation. And itā€™s the same code that works with fp16 in a function but not in a module.

I didnā€™t purposefully avoid using self for device conflicts.