I have been trying to implement a Conditional Variational Autoencoder with FastAI v2. I came across a great example in another thread in the forums here. So far, so good!
I am trying to run the original example notebook in Colab, and although I am using a GPU runtime, the GPU is being underutilised, while the CPU is working at 100%; an epoch takes 1:30 minutes, which is a lot compared to what is shown in the example notebook from the thread above (around 20 seconds per epoch). I am training on MNIST and have changed nothing about the notebook, yet I seem unable to make my setup utilise the GPU.
I used W&B to monitor the system stats - the GPU Utilisation is consistently under 15%, while the CPU utilisation is at 100%. My intution is that the model is waiting on the DataLoaders, which are using the CPU, but I’m not an expert in the new API. The relevant code is the following:
mnist = DataBlock(blocks=(ImageBlock(cls=PILImageBWNoised), CategoryBlock, ImageBlock(cls=PILImageTarget)), get_items=get_image_files, get_x= [noop, parent_label], splitter=GrandparentSplitter(train_name='training' if url == URLs.MNIST else 'train', valid_name='testing' if url == URLs.MNIST else 'valid'), batch_tfms=[AddNoiseTransform(.3), Normalize()], n_inp=2)
dls = mnist.dataloaders(path, num_workers=4, bs=4096, device=device)
I encountered some other topics in the forum where people suggested changing the num_workers argument to speed things up. This lead me to printing dls.num_workers and that returns 1, even though I set it to 4 in the above code snippet.
I am finding this very very confusing - why isn’t the GPU being utilised? What’s up with num_workers and why isn’t it being set correctly?
Any help is appreciated!