I’m pretty sure it is the number of batches. You could try to calculate how many batches are in one epoch and then multiply it with the number of epochs.
You might want to take a look at the chapter 4 in the fastbook. The following quote is from “SGD and Mini-Batches”
[…] we calculate the average loss for a few data items at a time. This is called a mini-batch. The number of data items in the mini-batch is called the batch size. A larger batch size means that you will get a more accurate and stable estimate of your dataset’s gradients from the loss function, but it will take longer, and you will process fewer mini-batches per epoch. Choosing a good batch size is one of the decisions you need to make as a deep learning practitioner to train your model quickly and accurately. We will talk about how to make this choice throughout this book.