Imagenette/ Imagewoof Leaderboards

Hi everyone,

I’m trying to replicate the Imagenette results shown in lesson 11; starting with 128px, 5 epochs but I’m getting much lower accuracy than expected when running on my laptop - running on GCP gives the expected results - would be great if anyone can help me understand what I’m doing wrong (o:

Here’s my notebook to show how I’m trying to replicate; https://github.com/pete88b/data-science/blob/master/fastai-things/imagenette-replicate-2019-04-08.ipynb

Edit: I think the answer might be that Imagenette in fastai 1.0.58 (on the GCP VM) has 500 validation items but 1.0.60 (on my laptop) has 3925? Edit2: pretty sure this is it - if I use the imagenette2-160 data on GCP, we’re back to 75% accuracy.

Pete

Thank everybody who participate at this! This is awesome!
I tried a lot of trick from here and i like it a lot! Sometimes i find how to improve a little, but until i test solution, somebody improve more, so i go back and implement that. Interesting, how far it can go!
Last improvement from ducha-aiki is amazing - it works very good!
I can beat it only with same attitude. It not so big, but anyway…
I tested in only on woof and only 5 and 20 epochs.

So here is my results:
size 128,
now 5 ep - 73.37%, 20 ep - 85.52%.
my results:
5: 73,58% 0.0084 std [0.751082, 0.734029, 0.728939, 0.727412, 0.737847]
20: no improvement - 85,22% 0.0061 [0.862560, 0.853143, 0.853652, 0.844490, 0.847544]
size 192,
now 5 ep - 75.94%, 20 ep - 87.25%, 80 ep - 89.21%
my results (bs64):
5 bs: 76,55% 0.0028 std [0.765335, 0.770934, 0.763808, 0.763044, 0.764571]
20 bs32: 87,85% 0.0022 [0.874777, 0.877832, 0.878595, 0.880377, 0.881395]
20 bs64: 87,44% 0.0014 [0.874014, 0.874014, 0.873505, 0.877322, 0.873250]
size 256,
now: 5 ep - 76.87%, 20 ep - 88.29%.
my results:
5: 78,84% (3 run) 0.0042 std [0.783151,0.788496, 0.793586]
20: 88,58% 0.0029 std [0.887503, 0.882667, 0.887758, 0.889285, 0.881904]
Here is links to nb size 128.
https://github.com/ayasyrev/imagenette_experiments/blob/master/Woof_MaxBlurPool_ResnetTrick_s128.ipynbpynb
Others in repo too.


Nb runs on colab, so it easy to rerun it.
I refactor xresnet from fastai v1 for better understand code and easy change model. And now thank to nbdev i can easy share this code. I use it for some time now, but it steel not for production (and not purposed for). I change code as i find what i want change something in model but cant do it easy. And when i start move it to github with nbdev, i rewrite a lot and find what its time for more refactor. So i start rewrite, now it more powerful but steel more like concept. Have a look, i hope it can be helpful.

Back to my solution. I like trick from “Bag of tricks” wean we change conv stride 2 on identity path by pool and conv stride 1.
So i think - why not do same with main path - change conv stride 2 to conv stride 1 and pool. Pool we already have - so i change ResBlock to first use pool to input and then split it to conv and identity paths. So look to code. I wrote explanation how I create model here:

4 Likes

@a_yasyrev would be great if you can submit a PR for the ones that show a reasonable improvement - which looks like the 192 and 256 px ones AFAICT. Congrats on the results! How does it impact training and inference speed?

Just run tests on colab, size 128, Tesla T4:
xresnet with SA - 1:15 per epoch.
xresnet, SA, Mish - 1:18 p/e.
newblock, SA - 1:14 p/e.
newblock, SA, Mish - 1:17 p/e.
So here same speed.
MaxBlurPool slow training, but has very good results.
new block, SA, Mish, MaxBlurPool - 1:20.
Will check on another comp.

3 Likes

One more test on colab. Same Tesla T4.
size 256, bs 32
xresnet - 2:48
xresnet with SA - 3:06 per epoch.
xresnet, SA, Mish - 3:45 p/e.
xresnet, SA, Mish, MaxBlurPool - 4:07 p/e

newblock - 2:31
newblock, SA - 2:50 p/e.
newblock, SA, Mish - 3.23 p/e.
new block, SA, Mish, MaxBlurPool - 3:49

@Jeremy just a thought, should we also include inference times? Such as batch/second or images/second? As part of this is also looking at how realistically from a deployment standpoint, how do they look? Let me know your thoughts on this :slight_smile: (obviously we’d focus on the accuracy only but it would be nice to know their real-time inference times too)

2 Likes

It’s a bit hard, since everyone has different hardware.

1 Like

Hi all, I’m excited to join in, even though I don’t quite have a better result yet.

As I had trouble with fastai2 (can’t find Mish module), I took @LessW2020 's repo from 6 months ago, but I couldn’t get 75% (imageWoof, 5 epochs 5 runs). I was only getting 67 or 68’ish. Maybe the fastai had an update or the dataset wasn’t the same, per @pete88b ? In any case, I was able to get a 1% improvement on that, by changing the 3x3 Conv layer with something a little more complicated (but mathematically well-motivated). I wonder if anyone could add the couple of lines (TwistLayer) in my repo to your 75% performing model and see how it fare. Thanks all (especially to Jeremy for making it all possible.)

(Apologies for the poor code, and poor PyTorch practice. It’s certainly a waste to have 2 extra full-scale conv2d weights. I hope it may be a little easier to understand what’s going on, and to experiment with.)

1 Like

Welcome! Cool to see your results! Nice Job :slight_smile: Yes, the dataset was changed to make it a bit harder, and so the leaderboard percentages were adjusted (larger validation set)

1 Like

Is that why the dataset is called imagewoof2? I’m confused what the current leaderboard is based on.

I made an 80 epoch run (still size=128) and got 87.27 at the end (highest 87.42 at second to last epoch), which is slightly higher than the current record of 87.20. Hooray!

Yes it is :slight_smile: and it’s what the current leaderboard is based on. The baselines were run with the original setup that we found to bring in a fair comparison.

I’ve made a start by using twist in some of the “standard” models using fastai v2 dev: https://github.com/pete88b/data-science/blob/master/fastai-things/train-imagewoof-with-TwistLayer.ipynb
hope it helps

Thanks, @pete88b. As I mentioned, there’s a lot of waste in parameters.

If you are testing on ResNeXt, did you give each conv2d groups argument? (That’s as much as I understand ResNeXt)

Briefly, I’m adding two extra conv2d (that I call convx and convy), but you can see that I “symmetrized” the weights, so instead of 9 parameters for each filter/channel/feature-map, it’s only 4 in effect. I also copied the convx weights into convy so the entire convy is extraneous. Of course once we have tested various possibilities we could write the TwistLayer more efficiently.

You asked where you can learn about TwistLayer. It’s related to the Neural ODE paper (and others) that interprets ResNet as solving a differential equation. I wrote about the mathematics here

but at the time I didn’t actually know ResNet (even now I know very little beyond ResNet) and I should do a complete rewrite.

I can open up a separate thread to answer questions. [Update: new thread here]

2 Likes

@liuyao that sounds very interesting. Since you’re decreasing the param count, you should get the best benefits with more epochs (so try 200) and less regularization (so try less mixup and larger random resize crop area).

2 Likes

Might not have followed the developments in this thread correctly, but can somebody briefly explain to me what’s a Twist Layer?

1 Like

I’ve simplified it a bit and it seems to be doing better (I’ll update with results). Here’s the conv_twist layer, replacing each 3x3 convolution. I don’t know if I can explain more briefly than the code:

class conv_twist(nn.Module):  # replacing 3x3 Conv2d
    def __init__(self, ni, nf, stride=1):
        super(conv_twist, self).__init__()
        self.conv = nn.Conv2d(ni, nf, kernel_size=3, stride=stride, padding=1, bias=False)
        self.convx = nn.Conv2d(ni, nf, kernel_size=3, stride=stride, padding=1, bias=False)
        self.convy = nn.Conv2d(ni, nf, kernel_size=3, stride=stride, padding=1, bias=False)
        self.convx.weight.data = (self.convx.weight - self.convx.weight.flip(2).flip(3)) / 2
        self.convy.weight.data = self.convx.weight.transpose(2,3).flip(2)
        # self.radii = nn.Parameter(torch.Tensor(nf), requires_grad=True)
        self.center_x = nn.Parameter(torch.Tensor(nf), requires_grad=True)
        self.center_y = nn.Parameter(torch.Tensor(nf), requires_grad=True)
        # self.radii.data.uniform_(0.3, 0.7)
        self.center_x.data.uniform_(-0.7, 0.7)
        self.center_y.data.uniform_(-0.7, 0.7)

    def forward(self, x):
        self.convx.weight.data = (self.convx.weight - self.convx.weight.flip(2).flip(3)) / 2  # make convx a first-order operator by symmetrizing it
        self.convy.weight.data = (self.convy.weight - self.convy.weight.flip(2).flip(3)) / 2
        # self.convy.weight.data = self.convx.weight.transpose(2,3).flip(2))                    # make convy a 90 degree rotation of convx
        x1 = self.conv(x)
        _, c, h, w = x1.size()
        XX = torch.from_numpy(np.indices((1,h,w))[2]*2/w).type(x.dtype).to(x.device) - self.center_x.view(-1,1,1)
        YY = torch.from_numpy(np.indices((1,h,w))[1]*2/h).type(x.dtype).to(x.device) - self.center_y.view(-1,1,1)
        # mask = ramp_func((XX**2+YY**2)/(self.radii.type(x.dtype).to(x.device).view(-1,1,1)**2))
        return x1 + (XX * self.convx(x) + YY * self.convy(x)) # * mask

Update: imagewoof2

Size (px) Epochs model mixup Accuracy # Runs
128 5 (Leaderboard) 73.37% 5, mean
128 5 RMS 0 68.54% 5, mean
128 5 RMS + twist 0 70.95% 5, mean
128 20 (Leaderboard) 85.52% 5, mean
128 20 RMS 0 84.62% 5, mean
128 20 RMS + twist 0 85.24% 5, mean
128 80 (Leaderboard) 87.20% 1
128 80 RMS + twist 0.2 87.81% 1
128 80 RMS + twist 0.5 88.52% 1
128 200 (Leaderboard) 87.20% 1
128 200 RMS + twist 0.2 88.70% 1
256 200 (Leaderboard) 90.38% 1
256 200 RMS + twist 0.2 91.52% 1

imagenette2

Size (px) Epochs model mixup Accuracy # Runs
256 200 (Leaderboard) 95.11% 1
256 200 RMS + twist 0.5 95.87% 1

@a_yasyrev, if you could help test with your ResNet trick + MaxBlurPool, that would be very nice.

1 Like

Any literature references for this?

Not that I know of. As I mentioned above, the initial observation in Neural ODE paper (and probably others) is related, but I don’t know about this particular implementation.

Maybe I can write about it in the fastpages blog :slight_smile:

2 Likes

I inadvertently replaced all the 1x1 convolutions (in addition to the 3x3’s) by conv_twist, and it bumped the acc up…

The resnet50 uses 1x1 in the bottleneck block (1x1 3x3 1x1), as well as some of the skip connections.

I don’t know what to make of it.

(Update: It probably only happens with 5 epochs… Ok, not a fair comparison. Still, that it doesn’t just break down is a surprise.)

2 Likes

Found time to make long runs.
Add nbdev to repo with notebooks, so now you can find all results and links to nbs on ‘doc’ page https://ayasyrev.github.io/imagenette_experiments/.
On 128 small improvements, but on 192 and 256 good enough.
size 128:
80 eps - 87.63% (now 87.20%)
200 eps - 88.30% (87.20%)
size 192:
80 eps - 89.69% (89.21%)
200 eps - 90.35% (89.54%)
size 256:
80 eps - 90.63% (90.48%)
200 eps - 91.14% (90.38%)
Did 3-4 runs.
Worth mentioning what i used start_pst 0.4-0.2 at long runs.

2 Likes