Really interesting paper for new evolved normalization layers: Evolving Normalization-Activation Layers that claim big performance gain over BatchNorm-ReLU for various image tasks. They have multiple versions of normalization layers, one batch dependent and another base only on instances (or group).
Thanks for sharing! They look promising
I would like to see how they perform in ImageWoof. theoretically, they should surpass all current results.
I have implemented EvoNorm in PyTorch for anyone to try here - https://github.com/digantamisra98/EvoNorm
Thanks for putting the time to put together a pytorch implementation. I tried to use your work to replace every BatchNorm layer in a model by EvoNorm. However, your implementation takes an “input”, and I don’t know what to feed this input when just replacing the layers…
Any thoughts ?
Hi. Input would be simply the number of channels.
thank you for your answer.
I created a notebook where I tried your implementation on MNIST. I train a vanilla model (Conv+ReLU), then add BatchNorm, then replace BatchNorm by EvoNorm.
Short story: they all behave cool on bs=64, but then on bs=4 and bs=2, batchnorm significantly outperforms EvoNorm (for bs=2, EvoNorm fails entirely).
The promise of EvoNorm, in the paper, is that it keeps a good level of performance when bs decreases. This doesn’t seem to happen here.
Did you do some experiments ?
Here is the notebook: https://github.com/bdubreu/fastpractice/blob/master/BatchNorm_vs_EvoNorm.ipynb
Oh, interesting. I never did any experiments in low batch size as 4 or 2. In fact the graph in the paper shows lowering the batch size to 32 if I recall correctly. I personally will prefer BN over EvoNorm because of the memory consumption issues of EvoNorm, recently google has been in a trend of making layers which have super high memory cost. And in fact I didn’t observe any significant difference in numbers in my minimal testing between BN and EvoNorm. Thanks for sharing your notebook, I’m pretty sure my implementation is however correct. I have seen some other people also having concerns of EvoNorm failing altogether in certain settings.
Yes, the paper goes only as low as 32… I was hoping it would be a good norm layer for settings where you have bs=1 or 2 and gradient accumulation. Alas, it doesn’t seem to perform. I didn’t try S0 on bs=1 though. I’ll try that and report hereif I’ve got some time.
Sorry, I saw this late. Did you happen to try S0 at B.S=1?
Hi @Diganta !
No, I never found the time to follow up with that, unfortunately. I just went on the cloud and rented some GCP instances to get a correct batch size. I might give it a try when I find some time, and will let you know if I do