I have two cells. For both, I am using same Data and model but different number of GPUs. Using 8 GPUs, I am getting much less accuracy (56%) as compared to 93% with one GPU.
tfms = get_transforms(flip_vert=False,max_zoom=1.0,max_warp=0,do_flip=False,xtra_tfms=[cutout()])
data = ImageDataBunch.from_csv(path='data', folder='train',
csv_labels='train.csv', suffix='.jpg',
ds_tfms=tfms, size=244, bs=16
).normalize(imagenet_stats)
learn = cnn_learner(data, models.resnet18, metrics=[error_rate, accuracy])
learn.model = nn.DataParallel(learn.model, device_ids=[0,1,2,3,4,5,6,7])
learn.fit_one_cycle(1)
epoch |
train_loss |
valid_loss |
error_rate |
accuracy |
time |
0 |
3.240810 |
1.891489 |
0.439320 |
0.560680 |
01:50 |
tfms = get_transforms(flip_vert=False,max_zoom=1.0,max_warp=0,do_flip=False,xtra_tfms=[cutout()])
data = ImageDataBunch.from_csv(path='data', folder='train',
csv_labels='train.csv', suffix='.jpg',
ds_tfms=tfms, size=244, bs=16
).normalize(imagenet_stats)
learn = cnn_learner(data, models.resnet18, metrics=[error_rate, accuracy])
learn.model = nn.DataParallel(learn.model, device_ids=[0])
learn.fit_one_cycle(1)
epoch |
train_loss |
valid_loss |
error_rate |
accuracy |
time |
0 |
0.619747 |
0.232222 |
0.060140 |
0.939860 |
01:43 |