Hey, thanks for checking it out. There is a gap between learn save at cell run-order 16 and learn load at 25, so maybe you did something in the 17-24 range that was the game changer. Can you recall doing anything in that gap that might have given you the big jump? Maybe you stumbled onto a good move that we can try in other models. Whatever you did dropped the error rate way below anything I could think to try.
@ronaldokun I was able to reproduce the 1% error rate (in fact even below 1%) and I am quite sure I found the cause for it.
Each time we create a new ImageDataBunch it generates a new random split of the dataset. What this means is that if you were to create a new ImageDataBunch and reload trained weights you will more than likely end up with a validation set that includes data from the previous training set, so in other words the model has already seen this data and is able to achieve a very low error rate as a result.
In order for the error rate results to be valid we would need to ensure that the validation split is always the same.
To ensure that you always have the same validation split make sure that you don’t forget to re-run np.random.seed(2) (filled in with whatever seed you used) because this seed is not retained after each iteration unless you re-run it.
Only to run with unfrozen layers first with a specific learning rate ( two orders of magnitude below the one I found with learn.lr_find() with the frozen layers ). This is something I saw people doing at kaggle competitions.
The following run with learn.fit_one_cycle() would be the fine-tuning step with cyclic learning rate.
Is there a thread/doc somewhere detailing what part of the whole process is reproducible when using the same random seed and what isn’t? I didn’t dig into it yet but at first glance nothing is the same in multiple runs with the same seed.
I intentionally avoid using reproducible runs, because I don’t think it’s a good idea to hide the randomness. However, it is important to use the same validation set each time. That’s why I set the random seed before I create the validation set.
Ok that makes sense, and I agree with your choice of default behaviour. But is there an easy way to force all the (non-CUDA related) randomization to be consistent between runs ? For debug purposes for example.
Yes but you need to re-run this cell block np.random.seed(2) before you re-create the ImageDataBunch otherwise it doesn’t retain the seed on the next iteration.
Thank you to @sgugger for pointing this out in the dev channel!
Have had the same experience (outside of fastai context). Although the respective docs for sklearn say that they obay numpy random seeds, I found that definitely not to be the case (i.e. for random forrest). Not sure how fastai behaves/uses this. Never investigated this further, but I think it has to do with scoping of the variables / context
Well, I have finally been able to break the 4% barrier without having to touch “wd” or the like. The only “advanced” thing I did was to enable both of my GPUs using “DataParallel”. I will try to post a link to GitHub once I figure that out later tonight.
@FourMoBro - good job. Try the exact same thing with a single GPU, except make all your learning rates half as big. You should get about the same result. Let us know how you go!
Just the learning rates. (You can also try with the same learning rates - now I think of it, actually I think the particular approach you used won’t require any change; but worth experimenting!)