0.0.29 regression bug

pattyhendrix · August 17, 2020, 7:52pm

i was going through this sweet notebook by @muellerzr and after updating fastai2 the results are off.

the first time i ran it i was using:
fastai2==0.0.17
fastcore==0.1.17
torchvision==0.5.0
torch==1.4.0

got these results:

then i updated to:
fastai2==0.0.29
fastcore==0.1.38
torch==1.6.0
torchvision==0.7.0

and got these results:

muellerzr · August 17, 2020, 9:25pm

That is indeed an annoying bug… Augmentations stayed the same?

Edit: can recreate this on my end, filed an issue

muellerzr · August 18, 2020, 3:40am

Okay we’ve isolated the issue (just to keep you updated), and I’m working on a PR now. What’s happening is when points go off-screen, we don’t actually properly clamp it (and my own clamp function actually didn’t solve this issue, as you can get a result such as: [ 1.0661, -0.0437] when our points need to be between -1 and 1).

pattyhendrix · August 18, 2020, 3:53pm

I was wondering about that. I remember seeing some predictions off the picture but wasn’t sure if it was just a bad prediction.

Even if the clamping function wasn’t working, shouldn’t the results still be the same because they were both using it?

muellerzr · August 18, 2020, 4:06pm

I don’t deny something else may be afoot here, but first let’s fix the bug we know is a bug The other option would be to try training without warp, etc and see if it’s still present. We may have simply gotten lucky. There were some hps not being passed down and some defaults weren’t quite the same in the image augmentations we fixed too in relation to the probabilities, so this could be another factor.

pattyhendrix · August 18, 2020, 4:28pm

ill try without the augmentations right now

pattyhendrix · August 18, 2020, 5:37pm

ok so it looks like theres definitely something with Flip() in batch transforms. im going to show them all because 0.0.17 with all batch tfms looks like its the best compared to 0.0.29 using one of the batch tfms. it seems like itd be easier for 0.0.29 to be better since theres only 3 epochs and one batch tfm.

0.0.17 with all batch tfms:

0.0.29:
all batch tfms:

just flip batch tfm:

no batch tfms:

just warp batch tfm:

just rotate batch tfm:

just zoom batch tfm:

just clampbatch tfm:

muellerzr · August 18, 2020, 5:55pm

I’d assume everything below here is from 0.0.29?

pattyhendrix · August 18, 2020, 6:07pm

oh yea.

how do you reply like that? with part of another post in your reply

muellerzr · August 18, 2020, 6:10pm

Select the text you want to specifically quote, and a “Quote” button will pop up and add it in

pattyhendrix · August 18, 2020, 6:15pm

sweeeet thanks!

muellerzr · August 18, 2020, 6:19pm

@pattyhendrix the next step is to look at the outputs from just using flip, are any of them going outside of -1, 1.? I’ll look and see if flip’s behavior changed lately, but I have not noticed a major change to that specific augmentation. Only big difference is flip’s p was taken from 0.5 to 1.0, so to properly recreate it we should make flip’s p 0.5 (It’s default p was adjusted, so do Flip(p=0.5) (and verify that it is 0.5 when running it by checking the dls.after_batch. This may not actually need to be a thing, but this would be what I first check

pattyhendrix · August 18, 2020, 7:47pm

Flips() default p for me was .5 so i switched it to 1 and it looks like it worked:

so i put all the batch tfms back on with Flip(p=1) and got:

its like 3 seconds slower than 0.0.17 and the valid loss is way worse but the predictions look ok. not as good as 0.0.17, but better than 0.0.17 if it had the valid loss of 0.0.29