Code Exception while attempting to run in nn.DataParallel mode, and with precompute=False

Hi @jeremy

Very interesting issue here, also should be easy to recreate. I was just trying to modify the source a little bit, primarily because I have a double GPU machine, and thought I’d try to anyways… for a lot of people, I understand this could be completely irrelevant (hence not that high priority of an issue).

Steps to recreate

  • Modify ConvnetBuilder.py, so that you surround the instantiation of each “Sequential” model creation with nn.DataParallel(…)"

  • In lesson1.ipynb, attempt to create a learner with precompute=False.

  • Try running learn.fit(…)

  • A stack-exception will be thrown in the torch library.

Strangely enough, I don’t see this if precompute=True. Hence, I figured it might be an issue with the fastai library too. I might also attempt to contact py-torch directly.

My motivation for doing the code change above is to have multiple gpu utilization using PyTorch’s out of the box nn.DataParallel module. In a simple experiment, I was able to follow the above pattern to run the model on multiple GPUs. Additionally, by setting precompute=True and running the model, I can still seem to run the model against both the GPUs.

Let me know. I’ve attached screenshots:

1 Like