Imagenette/ Imagewoof Leaderboards

pete88b · January 31, 2020, 10:19am

Hi everyone,

I’m trying to replicate the Imagenette results shown in lesson 11; starting with 128px, 5 epochs but I’m getting much lower accuracy than expected when running on my laptop - running on GCP gives the expected results - would be great if anyone can help me understand what I’m doing wrong (o:

Here’s my notebook to show how I’m trying to replicate; https://github.com/pete88b/data-science/blob/master/fastai-things/imagenette-replicate-2019-04-08.ipynb

Edit: I think the answer might be that Imagenette in fastai 1.0.58 (on the GCP VM) has 500 validation items but 1.0.60 (on my laptop) has 3925? Edit2: pretty sure this is it - if I use the imagenette2-160 data on GCP, we’re back to 75% accuracy.

Pete

a_yasyrev · February 12, 2020, 4:20pm

Thank everybody who participate at this! This is awesome!
I tried a lot of trick from here and i like it a lot! Sometimes i find how to improve a little, but until i test solution, somebody improve more, so i go back and implement that. Interesting, how far it can go!
Last improvement from ducha-aiki is amazing - it works very good!
I can beat it only with same attitude. It not so big, but anyway…
I tested in only on woof and only 5 and 20 epochs.

So here is my results:
size 128,
now 5 ep - 73.37%, 20 ep - 85.52%.
my results:
5: 73,58% 0.0084 std [0.751082, 0.734029, 0.728939, 0.727412, 0.737847]
20: no improvement - 85,22% 0.0061 [0.862560, 0.853143, 0.853652, 0.844490, 0.847544]
size 192,
now 5 ep - 75.94%, 20 ep - 87.25%, 80 ep - 89.21%
my results (bs64):
5 bs: 76,55% 0.0028 std [0.765335, 0.770934, 0.763808, 0.763044, 0.764571]
20 bs32: 87,85% 0.0022 [0.874777, 0.877832, 0.878595, 0.880377, 0.881395]
20 bs64: 87,44% 0.0014 [0.874014, 0.874014, 0.873505, 0.877322, 0.873250]
size 256,
now: 5 ep - 76.87%, 20 ep - 88.29%.
my results:
5: 78,84% (3 run) 0.0042 std [0.783151,0.788496, 0.793586]
20: 88,58% 0.0029 std [0.887503, 0.882667, 0.887758, 0.889285, 0.881904]
Here is links to nb size 128.
https://github.com/ayasyrev/imagenette_experiments/blob/master/Woof_MaxBlurPool_ResnetTrick_s128.ipynbpynb
Others in repo too.

Nb runs on colab, so it easy to rerun it.
I refactor xresnet from fastai v1 for better understand code and easy change model. And now thank to nbdev i can easy share this code. I use it for some time now, but it steel not for production (and not purposed for). I change code as i find what i want change something in model but cant do it easy. And when i start move it to github with nbdev, i rewrite a lot and find what its time for more refactor. So i start rewrite, now it more powerful but steel more like concept. Have a look, i hope it can be helpful.

Back to my solution. I like trick from “Bag of tricks” wean we change conv stride 2 on identity path by pool and conv stride 1.
So i think - why not do same with main path - change conv stride 2 to conv stride 1 and pool. Pool we already have - so i change ResBlock to first use pool to input and then split it to conv and identity paths. So look to code. I wrote explanation how I create model here:

github.com

ayasyrev/imagenette_experiments/blob/master/ResnetTrick_create_model_fit.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "8BfcVG2bBWm7"
   },
   "source": [
    "# setup and imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 297
    },

This file has been truncated. show original

jeremy · February 12, 2020, 5:33pm

@a_yasyrev would be great if you can submit a PR for the ones that show a reasonable improvement - which looks like the 192 and 256 px ones AFAICT. Congrats on the results! How does it impact training and inference speed?

a_yasyrev · February 12, 2020, 6:08pm

Just run tests on colab, size 128, Tesla T4:
xresnet with SA - 1:15 per epoch.
xresnet, SA, Mish - 1:18 p/e.
newblock, SA - 1:14 p/e.
newblock, SA, Mish - 1:17 p/e.
So here same speed.
MaxBlurPool slow training, but has very good results.
new block, SA, Mish, MaxBlurPool - 1:20.
Will check on another comp.

a_yasyrev · February 12, 2020, 7:36pm

One more test on colab. Same Tesla T4.
size 256, bs 32
xresnet - 2:48
xresnet with SA - 3:06 per epoch.
xresnet, SA, Mish - 3:45 p/e.
xresnet, SA, Mish, MaxBlurPool - 4:07 p/e

newblock - 2:31
newblock, SA - 2:50 p/e.
newblock, SA, Mish - 3.23 p/e.
new block, SA, Mish, MaxBlurPool - 3:49

muellerzr · February 12, 2020, 8:02pm

@Jeremy just a thought, should we also include inference times? Such as batch/second or images/second? As part of this is also looking at how realistically from a deployment standpoint, how do they look? Let me know your thoughts on this (obviously we’d focus on the accuracy only but it would be nice to know their real-time inference times too)

jeremy · February 12, 2020, 9:03pm

It’s a bit hard, since everyone has different hardware.

liuyao · February 20, 2020, 7:06pm

Hi all, I’m excited to join in, even though I don’t quite have a better result yet.

As I had trouble with fastai2 (can’t find Mish module), I took @LessW2020 's repo from 6 months ago, but I couldn’t get 75% (imageWoof, 5 epochs 5 runs). I was only getting 67 or 68’ish. Maybe the fastai had an update or the dataset wasn’t the same, per @pete88b ? In any case, I was able to get a 1% improvement on that, by changing the 3x3 Conv layer with something a little more complicated (but mathematically well-motivated). I wonder if anyone could add the couple of lines (TwistLayer) in my repo to your 75% performing model and see how it fare. Thanks all (especially to Jeremy for making it all possible.)

(Apologies for the poor code, and poor PyTorch practice. It’s certainly a waste to have 2 extra full-scale conv2d weights. I hope it may be a little easier to understand what’s going on, and to experiment with.)

muellerzr · February 20, 2020, 7:09pm

Welcome! Cool to see your results! Nice Job Yes, the dataset was changed to make it a bit harder, and so the leaderboard percentages were adjusted (larger validation set)

liuyao · February 20, 2020, 10:13pm

Is that why the dataset is called imagewoof2? I’m confused what the current leaderboard is based on.

I made an 80 epoch run (still size=128) and got 87.27 at the end (highest 87.42 at second to last epoch), which is slightly higher than the current record of 87.20. Hooray!

muellerzr · February 20, 2020, 10:45pm

Yes it is and it’s what the current leaderboard is based on. The baselines were run with the original setup that we found to bring in a fair comparison.

pete88b · February 21, 2020, 4:20pm

I’ve made a start by using twist in some of the “standard” models using fastai v2 dev: data-science/fastai-things/train-imagewoof-with-TwistLayer.ipynb at master · pete88b/data-science · GitHub
hope it helps

liuyao · February 21, 2020, 6:11pm

Thanks, @pete88b. As I mentioned, there’s a lot of waste in parameters.

If you are testing on ResNeXt, did you give each conv2d groups argument? (That’s as much as I understand ResNeXt)

Briefly, I’m adding two extra conv2d (that I call convx and convy), but you can see that I “symmetrized” the weights, so instead of 9 parameters for each filter/channel/feature-map, it’s only 4 in effect. I also copied the convx weights into convy so the entire convy is extraneous. Of course once we have tested various possibilities we could write the TwistLayer more efficiently.

You asked where you can learn about TwistLayer. It’s related to the Neural ODE paper (and others) that interprets ResNet as solving a differential equation. I wrote about the mathematics here

but at the time I didn’t actually know ResNet (even now I know very little beyond ResNet) and I should do a complete rewrite.

I can open up a separate thread to answer questions. [Update: new thread here]

jeremy · February 21, 2020, 7:03pm

@liuyao that sounds very interesting. Since you’re decreasing the param count, you should get the best benefits with more epochs (so try 200) and less regularization (so try less mixup and larger random resize crop area).

Diganta · February 22, 2020, 10:46pm

Might not have followed the developments in this thread correctly, but can somebody briefly explain to me what’s a Twist Layer?

liuyao · February 24, 2020, 5:37pm

I’ve simplified it a bit and it seems to be doing better (I’ll update with results). Here’s the conv_twist layer, replacing each 3x3 convolution. I don’t know if I can explain more briefly than the code:

class conv_twist(nn.Module):  # replacing 3x3 Conv2d
    def __init__(self, ni, nf, stride=1):
        super(conv_twist, self).__init__()
        self.conv = nn.Conv2d(ni, nf, kernel_size=3, stride=stride, padding=1, bias=False)
        self.convx = nn.Conv2d(ni, nf, kernel_size=3, stride=stride, padding=1, bias=False)
        self.convy = nn.Conv2d(ni, nf, kernel_size=3, stride=stride, padding=1, bias=False)
        self.convx.weight.data = (self.convx.weight - self.convx.weight.flip(2).flip(3)) / 2
        self.convy.weight.data = self.convx.weight.transpose(2,3).flip(2)
        # self.radii = nn.Parameter(torch.Tensor(nf), requires_grad=True)
        self.center_x = nn.Parameter(torch.Tensor(nf), requires_grad=True)
        self.center_y = nn.Parameter(torch.Tensor(nf), requires_grad=True)
        # self.radii.data.uniform_(0.3, 0.7)
        self.center_x.data.uniform_(-0.7, 0.7)
        self.center_y.data.uniform_(-0.7, 0.7)

    def forward(self, x):
        self.convx.weight.data = (self.convx.weight - self.convx.weight.flip(2).flip(3)) / 2  # make convx a first-order operator by symmetrizing it
        self.convy.weight.data = (self.convy.weight - self.convy.weight.flip(2).flip(3)) / 2
        # self.convy.weight.data = self.convx.weight.transpose(2,3).flip(2))                    # make convy a 90 degree rotation of convx
        x1 = self.conv(x)
        _, c, h, w = x1.size()
        XX = torch.from_numpy(np.indices((1,h,w))[2]*2/w).type(x.dtype).to(x.device) - self.center_x.view(-1,1,1)
        YY = torch.from_numpy(np.indices((1,h,w))[1]*2/h).type(x.dtype).to(x.device) - self.center_y.view(-1,1,1)
        # mask = ramp_func((XX**2+YY**2)/(self.radii.type(x.dtype).to(x.device).view(-1,1,1)**2))
        return x1 + (XX * self.convx(x) + YY * self.convy(x)) # * mask

Update: imagewoof2

Size (px)	Epochs	model	mixup	Accuracy	# Runs
128	5	(Leaderboard)		73.37%	5, mean
128	5	RMS	0	68.54%	5, mean
128	5	RMS + twist	0	70.95%	5, mean
128	20	(Leaderboard)		85.52%	5, mean
128	20	RMS	0	84.62%	5, mean
128	20	RMS + twist	0	85.24%	5, mean
128	80	(Leaderboard)		87.20%	1
128	80	RMS + twist	0.2	87.81%	1
128	80	RMS + twist	0.5	88.52%	1
128	200	(Leaderboard)		87.20%	1
128	200	RMS + twist	0.2	88.70%	1
256	200	(Leaderboard)		90.38%	1
256	200	RMS + twist	0.2	91.52%	1

imagenette2

Size (px)	Epochs	model	mixup	Accuracy	# Runs
256	200	(Leaderboard)		95.11%	1
256	200	RMS + twist	0.5	95.87%	1

@a_yasyrev, if you could help test with your ResNet trick + MaxBlurPool, that would be very nice.

Diganta · February 24, 2020, 9:02pm

Any literature references for this?

liuyao · February 25, 2020, 2:00am

Not that I know of. As I mentioned above, the initial observation in Neural ODE paper (and probably others) is related, but I don’t know about this particular implementation.

Maybe I can write about it in the fastpages blog

liuyao · February 25, 2020, 9:07pm

I inadvertently replaced all the 1x1 convolutions (in addition to the 3x3’s) by conv_twist, and it bumped the acc up…

The resnet50 uses 1x1 in the bottleneck block (1x1 3x3 1x1), as well as some of the skip connections.

I don’t know what to make of it.

(Update: It probably only happens with 5 epochs… Ok, not a fair comparison. Still, that it doesn’t just break down is a surprise.)

a_yasyrev · February 26, 2020, 8:54am

Found time to make long runs.
Add nbdev to repo with notebooks, so now you can find all results and links to nbs on ‘doc’ page https://ayasyrev.github.io/imagenette_experiments/.
On 128 small improvements, but on 192 and 256 good enough.
size 128:
80 eps - 87.63% (now 87.20%)
200 eps - 88.30% (87.20%)
size 192:
80 eps - 89.69% (89.21%)
200 eps - 90.35% (89.54%)
size 256:
80 eps - 90.63% (90.48%)
200 eps - 91.14% (90.38%)
Did 3-4 runs.
Worth mentioning what i used start_pst 0.4-0.2 at long runs.