i have small improvement on 5 epochs.
I can’t reproduce results as in leaderboard, so my baseline on colab, same arguments is:
0.7412, std 0.011771156
[0.746 0.75 0.748 0.744 0.718]
Same, but act_fn Relu:
0.75720006 std 0.010007978
[0.744 0.766 0.758 0.77 0.748]
And with LeakyRelu:
0.7576 std 0.0058514797
[0.758 0.756 0.748 0.766 0.76 ]
Here results: https://gist.github.com/ayasyrev/eb91d64b219e36898ea952355f239586
Most important here - when i tested different activations, it was strange results and i began check everething.
And i find bug in xresnet implementation (so in mxresnet too)!
In func init_cnn, we init model as nn.init.kaiming_normal_. But default argument is: nonlinearity=‘leaky_relu’
So - i change it to nonlinearity=‘relu’ and got better result. Same for LeakyRelu.
There is no implementation in torch for Mish - so may be it place for better result!
3 Likes