Training using Activation Stats callback is very slow

I’m trying to visualize the activations stats of my model by calling the ActivationStats callback when defining my learner as defined in the convolutions chapter.

But by doing so the training process is getting extremely slow? Is this normal and that storing these maps reduces training speed drastically or am I doing something wrong?

I have the same issue:

I use:

torch version: 1.6.0
torchvision version: 0.7.0
fastai version: 2.0.15

Here is an reproducible example:

data = untar_data(URLs.IMAGENETTE)

imagenette = DataBlock(blocks = (ImageBlock, CategoryBlock), 
                   get_items=get_image_files, 
                   get_y = lambda x: x.parent.name, 
                   item_tfms=Resize(448), 
                   batch_tfms = aug_transforms(size = 224))

dls = imagenette.dataloaders(data)

learn = cnn_learner(dls, resnet18)
learn.fit(5)

Unbenannt

learn = cnn_learner(dls, 
                resnet18, 
                cbs = ActivationStats(with_hist=True)) # Activation Stats slows learner significantly
learn.fit(5)

Unbenannt2

I initially thought, the training falls back to CPU; but looking at nvidia-smi shows that the GPU is still used but only 3-7% volatile GPU usage.