If you do not know, I run dual 1080Ti cards. Up to this point, I have been able to use both of my cards by adding learn.model = torch.nn.DataParallel(learn.model, device_ids=[0, 1])
and adjusting the batch size to maximize the VRAM. This works great when the learn object is create_cnn
. However, on this notebook using the latest fastai (1.0.21), I get an interrupt when running the fit_one_cycle
~/anaconda3/envs/course1018/lib/python3.6/site-packages/fastai/vision/models/unet.py in forward(self, up_in)
35 up_out = F.interpolate(up_in, s.shape[-2:], mode='bilinear')
36 up_out = self.upconv(up_out)
---> 37 cat_x = self.bn1(F.relu(torch.cat([up_out, s], dim=1)))
38 x = self.bn2(F.relu(self.conv1(cat_x)))
39 x = F.relu(self.conv2(x))
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 4 and 8 in dimension 0 at /opt/conda/conda-bld/pytorch-nightly_1540121100527/work/aten/src/THC/generic/THCTensorMath.cu:83
The notebook will run fine if the line enabling the dataParallel is commented out. It appears to be a incompatibility between unet.py
and data_parallel.py
Any thoughts?