Many thanks, Molly!
It is a pretty small dataset, so that could be increasing the swings.
I’ve tried cross validation, but still get significant variation,
and of course it takes much longer.
I do have some room to increase batch size, so will try that - didn’t think
of doing that but it makes sense, as does momentum.
I already increased weight decay from the default values, and that visibly
improved performance even through the noise. This seems reasonable to me
(though I’m not sure if my intuition is correct) - with a small dataset,
I’d think the training would be prone to overfitting, which increasing weight
decay should help.
I’ll try the Weights and Biases callback - is that this one: