Stair-step loss in collaborative filtering models

I am noticing the behavior described in this 2019 post in the vanilla DotProductBias model for collaborative filtering. I replied there, but I’m reposting here because probably no one is watching that old forum.

Any insights into why the loss is discontinuous at epoch boundaries, even with data shuffling?

(Ok now that I look more closely at it: the loss is increasing within each epoch. Perhaps this is because batches are sampled without replacement, causing a bias towards batches containing less-common items toward the end of an epoch. The only discussion I’m finding of this phenomenon is here.)