Is there a difference in how fastai v1.0.3 and v > 1.0.3 train networks?

clarkeaa13 · December 10, 2018, 6:45pm

I have observed, that with the same exact learner setup (1-cycle policy/learning rate of 0.001/default head/5 epochs frozen/15 epochs unfrozen) that the performance of the learner can widely vary if using fastai 1.0.3 or versions above 1.0.3. For instance, in fastai >1.0.3 I get ~65% accuracy on Stanford Cars whereas in 1.0.3 I get over 80%:

Green line on the test of v 1.0.20 is circled, and red line of v 1.0.3 is circled (the other line plots were separate experiments)

These results I’ve found to be consistent over several trials and can’t be explained by “bad luck.”

What could be causing this discrepancy? Is there some kind of silent gradient clipping automatically added in some versions of fastai but not others? Or maybe there is a different way of calculating the gradients in different versions; for instance silently using fp16 or something that could cause different gradients?

sgugger · December 10, 2018, 9:59pm

I can’t remember which version exactly but transfer learning was temporary broken in one of them. Maybe that’s the explanation?

clarkeaa13 · December 10, 2018, 10:41pm

What specifically do you mean by “transfer learning was broken?” Was it broken in more than one version?

In my experiments every version I’ve tried after 1.0.3 gives poorer results on Stanford Cars. 1.0.3 does not seem broken to me because it has worked well on other datasets besides Cars. I’ve only achieved 80% + on Stanford Cars using v1.0.3 and below.

I’m certain I’ve tried at least 1.0.18, 1.0.20, 1.0.22 and they all have this problem. I also tried on 1.0.30 + (can’t remember which one exactly) and it also has the same issue. I’ll try again on a version > 1.0.30 to double check.