It looks like you are fitting better to the training data (train loss < valid loss) but your accuracy is still increasing/error rate still is going down, so not a real overfitting.
Based on your lr_find results, try to setup your learning with a max_lr parameter like this:
learn.fit_one_cycle(x, max_lr=1e-6)
(x = number of epochs)
When you don’t supply a max_lr parameter it uses 0.003, which is maybe too much in your case and you end up “going down the minima valley jumping around or jumping out of it” (nice animation, see the last two images).
I would be curious about what happens when you train a couple of epochs on top with a max_lr?