Poorer results with smaller batch sizes using ULMFiT

Something I have noticed is that when using ULMFiT, the smaller the batch size I am forced to use the worse my results (regardless of how long I train)

The metrics I am using are accuracy and auroc.

Has anyone seen something similar? I also notice that the loss I get is higher the more I decrease the batch size.

I am using these lines to train the language model and classifier respectively.
learn.fit_one_cycle(10, lr, moms=(0.8,0.7), wd=0.1)
learn.fit_one_cycle(1, lr, moms=(0.8,0.7), wd=0.1)
I use different learning rates and epocs accordingly as well.

Does anyone have any suggestions? Should I be tuning the moms and wd as well?

I’m getting the final results out of the recorder with:
results = dict(zip(learn.recorder.metrics_names, learn.recorder.metrics[-1]))

Also if someone knows a batter way to get what the results were on the validation set I’d love to know.

The reason this is an issue for me is I have to reduce the batch size to be able to run on my home graphics card. Using one in the cloud with 16GB or ram I can up the batch size and get better results than the one at home with 11GB of ram.

I’d obviously like to be able to simply use my home computer where possible for long running training.