Code Exception while attempting to run in nn.DataParallel mode, and with precompute=False

apil.tamang · November 3, 2017, 9:18pm

Very interesting issue here, also should be easy to recreate. I was just trying to modify the source a little bit, primarily because I have a double GPU machine, and thought I’d try to anyways… for a lot of people, I understand this could be completely irrelevant (hence not that high priority of an issue).

Steps to recreate

Modify ConvnetBuilder.py, so that you surround the instantiation of each “Sequential” model creation with nn.DataParallel(…)"
In lesson1.ipynb, attempt to create a learner with precompute=False.
Try running learn.fit(…)
A stack-exception will be thrown in the torch library.

Strangely enough, I don’t see this if precompute=True. Hence, I figured it might be an issue with the fastai library too. I might also attempt to contact py-torch directly.

My motivation for doing the code change above is to have multiple gpu utilization using PyTorch’s out of the box nn.DataParallel module. In a simple experiment, I was able to follow the above pattern to run the model on multiple GPUs. Additionally, by setting precompute=True and running the model, I can still seem to run the model against both the GPUs.

Let me know. I’ve attached screenshots: