I’m trying to replicate the Imagenette results shown in lesson 11; starting with 128px, 5 epochs but I’m getting much lower accuracy than expected when running on my laptop - running on GCP gives the expected results - would be great if anyone can help me understand what I’m doing wrong (o:
Edit: I think the answer might be that Imagenette in fastai 1.0.58 (on the GCP VM) has 500 validation items but 1.0.60 (on my laptop) has 3925? Edit2: pretty sure this is it - if I use the imagenette2-160 data on GCP, we’re back to 75% accuracy.
Thank everybody who participate at this! This is awesome!
I tried a lot of trick from here and i like it a lot! Sometimes i find how to improve a little, but until i test solution, somebody improve more, so i go back and implement that. Interesting, how far it can go!
Last improvement from ducha-aiki is amazing - it works very good!
I can beat it only with same attitude. It not so big, but anyway…
I tested in only on woof and only 5 and 20 epochs.
So here is my results:
size 128,
now 5 ep - 73.37%, 20 ep - 85.52%.
my results:
5: 73,58% 0.0084 std [0.751082, 0.734029, 0.728939, 0.727412, 0.737847]
20: no improvement - 85,22% 0.0061 [0.862560, 0.853143, 0.853652, 0.844490, 0.847544]
size 192,
now 5 ep - 75.94%, 20 ep - 87.25%, 80 ep - 89.21%
my results (bs64):
5 bs: 76,55% 0.0028 std [0.765335, 0.770934, 0.763808, 0.763044, 0.764571]
20 bs32: 87,85% 0.0022 [0.874777, 0.877832, 0.878595, 0.880377, 0.881395]
20 bs64: 87,44% 0.0014 [0.874014, 0.874014, 0.873505, 0.877322, 0.873250]
size 256,
now: 5 ep - 76.87%, 20 ep - 88.29%.
my results:
5: 78,84% (3 run) 0.0042 std [0.783151,0.788496, 0.793586]
20: 88,58% 0.0029 std [0.887503, 0.882667, 0.887758, 0.889285, 0.881904]
Here is links to nb size 128. https://github.com/ayasyrev/imagenette_experiments/blob/master/Woof_MaxBlurPool_ResnetTrick_s128.ipynbpynb
Others in repo too.
Nb runs on colab, so it easy to rerun it.
I refactor xresnet from fastai v1 for better understand code and easy change model. And now thank to nbdev i can easy share this code. I use it for some time now, but it steel not for production (and not purposed for). I change code as i find what i want change something in model but cant do it easy. And when i start move it to github with nbdev, i rewrite a lot and find what its time for more refactor. So i start rewrite, now it more powerful but steel more like concept. Have a look, i hope it can be helpful.
Back to my solution. I like trick from “Bag of tricks” wean we change conv stride 2 on identity path by pool and conv stride 1.
So i think - why not do same with main path - change conv stride 2 to conv stride 1 and pool. Pool we already have - so i change ResBlock to first use pool to input and then split it to conv and identity paths. So look to code. I wrote explanation how I create model here:
@a_yasyrev would be great if you can submit a PR for the ones that show a reasonable improvement - which looks like the 192 and 256 px ones AFAICT. Congrats on the results! How does it impact training and inference speed?
Just run tests on colab, size 128, Tesla T4:
xresnet with SA - 1:15 per epoch.
xresnet, SA, Mish - 1:18 p/e.
newblock, SA - 1:14 p/e.
newblock, SA, Mish - 1:17 p/e.
So here same speed.
MaxBlurPool slow training, but has very good results.
new block, SA, Mish, MaxBlurPool - 1:20.
Will check on another comp.
One more test on colab. Same Tesla T4.
size 256, bs 32
xresnet - 2:48
xresnet with SA - 3:06 per epoch.
xresnet, SA, Mish - 3:45 p/e.
xresnet, SA, Mish, MaxBlurPool - 4:07 p/e
newblock - 2:31
newblock, SA - 2:50 p/e.
newblock, SA, Mish - 3.23 p/e.
new block, SA, Mish, MaxBlurPool - 3:49
@Jeremy just a thought, should we also include inference times? Such as batch/second or images/second? As part of this is also looking at how realistically from a deployment standpoint, how do they look? Let me know your thoughts on this (obviously we’d focus on the accuracy only but it would be nice to know their real-time inference times too)
Hi all, I’m excited to join in, even though I don’t quite have a better result yet.
As I had trouble with fastai2 (can’t find Mish module), I took @LessW2020 's repo from 6 months ago, but I couldn’t get 75% (imageWoof, 5 epochs 5 runs). I was only getting 67 or 68’ish. Maybe the fastai had an update or the dataset wasn’t the same, per @pete88b ? In any case, I was able to get a 1% improvement on that, by changing the 3x3 Conv layer with something a little more complicated (but mathematically well-motivated). I wonder if anyone could add the couple of lines (TwistLayer) in my repo to your 75% performing model and see how it fare. Thanks all (especially to Jeremy for making it all possible.)
(Apologies for the poor code, and poor PyTorch practice. It’s certainly a waste to have 2 extra full-scale conv2d weights. I hope it may be a little easier to understand what’s going on, and to experiment with.)
Welcome! Cool to see your results! Nice Job Yes, the dataset was changed to make it a bit harder, and so the leaderboard percentages were adjusted (larger validation set)
Is that why the dataset is called imagewoof2? I’m confused what the current leaderboard is based on.
I made an 80 epoch run (still size=128) and got 87.27 at the end (highest 87.42 at second to last epoch), which is slightly higher than the current record of 87.20. Hooray!
Yes it is and it’s what the current leaderboard is based on. The baselines were run with the original setup that we found to bring in a fair comparison.
Thanks, @pete88b. As I mentioned, there’s a lot of waste in parameters.
If you are testing on ResNeXt, did you give each conv2d groups argument? (That’s as much as I understand ResNeXt)
Briefly, I’m adding two extra conv2d (that I call convx and convy), but you can see that I “symmetrized” the weights, so instead of 9 parameters for each filter/channel/feature-map, it’s only 4 in effect. I also copied the convx weights into convy so the entire convy is extraneous. Of course once we have tested various possibilities we could write the TwistLayer more efficiently.
You asked where you can learn about TwistLayer. It’s related to the Neural ODE paper (and others) that interprets ResNet as solving a differential equation. I wrote about the mathematics here
but at the time I didn’t actually know ResNet (even now I know very little beyond ResNet) and I should do a complete rewrite.
I can open up a separate thread to answer questions. [Update: new thread here]
@liuyao that sounds very interesting. Since you’re decreasing the param count, you should get the best benefits with more epochs (so try 200) and less regularization (so try less mixup and larger random resize crop area).
I’ve simplified it a bit and it seems to be doing better (I’ll update with results). Here’s the conv_twist layer, replacing each 3x3 convolution. I don’t know if I can explain more briefly than the code:
Not that I know of. As I mentioned above, the initial observation in Neural ODE paper (and probably others) is related, but I don’t know about this particular implementation.
Found time to make long runs.
Add nbdev to repo with notebooks, so now you can find all results and links to nbs on ‘doc’ page https://ayasyrev.github.io/imagenette_experiments/.
On 128 small improvements, but on 192 and 256 good enough.
size 128:
80 eps - 87.63% (now 87.20%)
200 eps - 88.30% (87.20%)
size 192:
80 eps - 89.69% (89.21%)
200 eps - 90.35% (89.54%)
size 256:
80 eps - 90.63% (90.48%)
200 eps - 91.14% (90.38%)
Did 3-4 runs.
Worth mentioning what i used start_pst 0.4-0.2 at long runs.