CPU RAM Usage Keeps Growing as Training One Cycle

You could try running something in one of the batch callbacks (for instance, on_batch_begin) and you could use the SaveModelCallback from here https://github.com/fastai/fastai/blob/master/fastai/callbacks/tracker.py for inspiration (but instead of saving when some metric improves, you could run it every set amount of epochs).

This is a sampler that I have been using for quite some time now - the new data block API makes it super simple to use a custom sampler, which I think I super nice, because it opens a way to easily experiment with things like negative hard mining or some other way of constructing the batches (which somewhat seems the next natural step on top of one cycle that I don’t think has been explored to a great extent yet, I think for some of the imagenet training in record time the fastai crew did something along those lines but I might be misremembering things).

Anyhow, here is the sampler:

def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]

class RandomSamplerWithEpochSize(Sampler):
    """Yields epochs of specified sizes. Iterates over all examples in a data_source in random
    order. Ensures (nearly) all examples have been trained on before beginning the next iteration
    over the data_source - drops the last epoch that would likely be smaller than epoch_size.
    """
    def __init__(self, data_source, epoch_size):
        self.n = len(data_source)
        self.epoch_size = epoch_size
        self._epochs = []
    def __iter__(self):
        return iter(self.next_epoch)
    @property
    def next_epoch(self):
        if len(self._epochs) == 0: self.generate_epochs()
        return self._epochs.pop()
    def generate_epochs(self):
        idxs = [i for i in range(self.n)]
        np.random.shuffle(idxs)
        self._epochs = list(chunks(idxs, self.epoch_size))[:-1]
    def __len__(self):
        return self.epoch_size

and this is how I construct the data_bunch:

train_dl = DataLoader(
    label_lists.train,
    num_workers=12,
    batch_sampler=BatchSampler(RandomSamplerWithEpochSize(label_lists.train, 200_000), bs, True)
)
valid_dl = DataLoader(label_lists.valid, 2*bs, False, num_workers=12)
test_dl = DataLoader(label_lists.test, 2*bs, False, num_workers=12)

data_bunch = ImageDataBunch(train_dl, valid_dl, test_dl)
6 Likes