Multiple implementations listed here: https://paperswithcode.com/paper/efficientnet-rethinking-model-scaling-for
Edit: some have been linked to already, but this website will automatically add new implementations as they come up.
Multiple implementations listed here: https://paperswithcode.com/paper/efficientnet-rethinking-model-scaling-for
Edit: some have been linked to already, but this website will automatically add new implementations as they come up.
Thanks for this link @Seb!
So basically there are 3 PyTorch implementations. Iām going to try and review all 3 and leverage that to code up my ownā¦and then try to see about wrappering it into FastAI v2 once I verify my own implementation.
The one that looks the cleanest to me is this one:
But Iām going to walk through all of them in more detail and see what differences, ultimately, exist.
Sounds good.
The Pytorch projects seem like literal translations of the original ones more or less. I see a lot of potential for refactoring to make things more fastai-ish.
Thanks for this link @MicPie!
I have my github repro made but nothing is checked in yet b/c Iām still working on it https://github.com/lessw2020/EfficientNet-PyTorch/blob/master/README.md
Now that thereās 3 implementations, Iām trying to go through all of them now and hopefully leverage the best of each in terms of cleanliness of implementation, and once I have my own, Iāll test it out on MNIST for basic checking, and then try and rewrite similar to how XResNet was done for FastAI integration.
(*Thatās a lot of work though, so Iād welcome any and all help!)
yes, exactly - these are all standalone projects with no integrationā¦so hopefully we can build an improvement to it in that respect.
That said, Iām really happy to have these 3 implementations as the authors solved a couple translation from TF questions I had yesterday.
They are a bit different - drop connect is different than dropout:
dropout is for the activations, and drop_connect is for the weights .
Hereās the code I just checked in for the drop_connect for my implementation:
class Drop_Connect(nn.Module):
"""create a tensor mask and apply to inputs, for removing drop_ratio % of weights"""
def __init__(self, drop_ratio=0):
super().__init__()
self.keep_percent = 1.0 - drop_ratio
def forward(self, x):
if not self.training():
return x
batch_size = x.size(0)
random_tensor = self.keep_percent
random_tensor += torch.rand([batch_size, 1, 1, 1], dtype=x.dtype)
binary_tensor = torch.floor(random_tensor)
output = x / self.keep_percent * binary_tensor
return output
Iāve published my WIP here:
So far I have EfficientNet-B0 running on Imagewoof, though I havenāt spent much time checking that my work is an accurate replication.
EfficientNet-B0 does train faster than xresnet50, but is not as good after 80 epochs. Things should get more interesting with B3+ if that accuracy chart is accurate.
@Seb - thanks for the inital results! Can you see if using Swish which is their activation function in the paper matters?
i.e. you have:
act_fn = nn.ReLU(inplace=True)
I have:
act_fn = eu.Swish() #eu is my utility file import
class Swish(nn.Module):
def forward(self, x):
x = x * torch.sigmoid(x) #nn.functional.sigmoid is deprecated, use torch.sigmoid instead
return x
Nice catch! Swish does seem to do better than ReLU. Updated my repo. I probably missed other detailsā¦
I found another mistake: batchnorm-momentum in Pytorch is 1 - batchnorm-momentum from Tensorflowā¦
Edit to add: interestingly results over 80 epochs are not as good with BN momentum =0.01 rather than 0.99
I got all models B0 to B7 implemented in my repo now, but Iām getting weird results so I probably got it wrong. Will look into it tomorrow.
One comment is that once you get past B3, image sizes force batch size to decrease, which slows down training. I guess we could still just increase image size progressively.
Glad to hear the Swish change is helping. Iām going to test out FTSwishPlus() once Iām up and running.
Iām about one error away from having B0 up and running.
Iām going to try to compare your impl, mine, and the two/three others out there and hopefully pick up any errors and/or design issues.
Re: XResNet comparison - so XResNet50 is about the same as ResNet152ā¦so a B1 should be just a bit better than an XResNet50 is that comparison all holds true, and a B0 should underperform.
More interesting of course is two things:
1 - a B4 or B5 vs XResNet50 and XResNet152ā¦and of course comparing total parameters.
2 - Even if accuracy is the same, if EffNet is doing it with 1/5 the params, and training faster as well, then thatās still a better arch imo.
And, if XResNet outperforms then all the better for FastAI
@Seb Would you current code be able to train on datasets other than imagewoof/imagenette by just substituting the ImageList with another?
Iām quite new to this so Iām sorry if Iām missing something obvious, but Iām currently getting RuntimeError: CUDA error: device-side assert triggered
when I do that with my own data but it works on the imagewoof/imagenette ones just fine.
Thanks for trying my code out! It should run on other datasets(although note I still have to confirm I built the models correctly).
My guess is you need to change c_out which is the number of classes your data set has. I havenāt created a parameter for that so youāll need to change it directly in line 63 in train.py.
Otherwise, youād need to get a more useful error message by doing the following
First thing is to try to run the code on CPU. CPU code has more checks so it will possibly return a better error message.
If the CPU code runs without error, then run the same thing withCUDA_LAUNCH_BLOCKING=1
to get a proper error message and stack trace.
Oh yes, changing the c_out fixed my problem, thank you for the suggestion!
Do you plan on making pretrained Imagenet models for each of the networks as well in the near future?
Great!
I am not sure about pretrained models. Maybe we can figure out how to convert the weights from Tensorflow (or reuse the conversion done by other Pytorch implementations)
Current goal is to have efficientnet.py code be closer in style to xresnet.py and integrated to fastai so that we can more easily experiment with the model. I like Jeremyās goal of having the whole model fit on one screen.
If you have a need for pretrained models, I recommend checking out other Pytorch repos such as this one
Donāt rush using my repo; I just fixed a couple issues, namely squeeze-ex and drop-connect were not being used in the modelā¦
IME this class implementation doesnāt play well with fp16 training⦠I had to go back to the function version.
Thanks for the update. I just checked your code, I see you are avoiding the self. usage /storage to avoid a device conflict?
Ok Iāll update mine to match.
I did get a device conflict with
random_tensor += torch.rand([batch_size, 1, 1, 1], dtype=x.dtype,device=x.device)
And thus added device = x.device. That same line caused issues with dtype when using fp16.
Iām a bit unsure as to whatās going on. That function worked fine without making the device explicit in another Pytorch implementation. And itās the same code that works with fp16 in a function but not in a module.
I didnāt purposefully avoid using self for device conflicts.