Hello, I’ve been fighting with this for weeks and could use some advice on how best to debug this issue.
When working on the full mnist classifier, i’ve found that when i run learn.fit, my batch accuracy is either very low or stuck at a value, but when i then manually apply my model to a batch and calculate the batch accuracy on my own, the value and predictions look quite good (after the learn.fit has updated the weights).
I’ve shared a cached set of results to demonstrate what i’m seeing here: trainging example
or there’s the actual notebook
If you look at ln[19], you can see i’m running learn.fit and printing every batch result to show pretty bad looking results. However, right after the training has run, in ln[20]&[21], i manually run the same accuracy metric on a batch w/ the same model and then accuracy and predictions look pretty decent. I run learn.fit again (ln[22]) and the values look bad again.
I’m currently trying various callbacks in the learn loop to get an idea of where things are going wrong, but i’m a bit lost. It seems to me like backprogation isn’t occurring or possibly my batch_accuracy function is just not working properly inside the learner.
Any advice would be greatly appreciated, thanks!