In the Chapter 4 Jupyter notebook, there is a code block for transfer learning resnet18 on the 3’s and 7’s mnist data set. Jeremy points our that running it as a script will allow you to run without the number of workers being set to 0. I looked at the example in the notebooks in the repository here: https://github.com/fastai/fastai/blob/master/nbs/examples/dataloader_spawn.py but I was unsuccessful in understanding how to convert the code block given (and other code blocks obviously needed for it) to something that will run successfully from a terminal and still utilize the GPU.
Code from the notebook:
dls = ImageDataLoaders.from_folder(path) learn = cnn_learner(dls, resnet18, pretrained=False, loss_func=F.cross_entropy, metrics=accuracy) learn.fit_one_cycle(1, 0.1)
I tried a few things and have ran into the following issues:
- It looks like it will redo the data untar-ing and loading into memory step between each epoch (I changed epochs to 3 and there was a significant delay between each epoch)
- The GPU is not being utilized during the training.
I have tried multiple iterations of the script but here is the most recent:
from multiprocessing.dummy import freeze_support import fastai from fastai.vision.all import * from fastbook import * from scipy.fft import set_workers matplotlib.rc('image', cmap='Greys') fastai.torch_core.default_device = torch.device('cuda') def get_data(path): return ImageDataLoaders.from_folder(path, persistent_workers=True, set_workers = 6) # def get_data(): # path = untar_data(URLs.MNIST_SAMPLE) # return DataBlock( # blocks=(ImageBlock, CategoryBlock), get_items=get_image_files, # splitter=GrandparentSplitter(valid_name='val'), # get_y=parent_label).dataloaders(path, bs=128) def get_model(): return cnn_learner(dls, resnet18, pretrained=False, loss_func=F.cross_entropy, metrics=accuracy) if __name__ == '__main__': # freeze_support() multiprocessing.set_start_method('spawn') # fastai.torch_core.default_device = torch.device('cuda') path = untar_data(URLs.MNIST_SAMPLE) Path.BASE_PATH = path os.environ["PATH"] += os.pathsep + 'C:\\Program Files\\Graphviz\\bin' dls = get_data(path) # dls = get_data() print('done') learn = get_model() print('done') learn.fit_one_cycle(3, 0.1)
I’ve tried reverse engineering the functions as well but I couldn’t figure out the proper way to write the script so it would use the GPU or not run into memory issues (because it seems to keep the batches loaded in memory but still re-load new ones between epochs).