Lesson-1-pets Benchmarks

hwasiti · November 30, 2018, 8:43pm

I have a GCP instance:

Here is my installed library versions:

from fastai.utils import *
show_install()


=== Software === 
python version  : 3.7.0
fastai version  : 1.0.30
torch version   : 1.0.0.dev20181120
nvidia driver   : 410.72
torch cuda ver  : 9.2.148
torch cuda is   : available
torch cudnn ver : 7401
torch cudnn is  : enabled

=== Hardware === 
nvidia gpus     : 2
torch available : 1
  - gpu0        : 16130MB | Tesla V100-SXM2-16GB
  - gpu1        : 16130MB | Tesla V100-SXM2-16GB

=== Environment === 
platform        : Linux-4.9.0-8-amd64-x86_64-with-debian-9.6
distro          : #1 SMP Debian 4.9.130-2 (2018-10-27)
conda env       : base
python          : /opt/anaconda3/bin/python
sys.path        : 
/home/jupyter/fastai-course-v3/nbs/dl1
/opt/anaconda3/lib/python37.zip
/opt/anaconda3/lib/python3.7
/opt/anaconda3/lib/python3.7/lib-dynload
/opt/anaconda3/lib/python3.7/site-packages
/opt/anaconda3/lib/python3.7/site-packages/IPython/extensions
/home/jupyter/.ipython

Hardware specs:
OS: Debian 4.9.130-2 (2018-10-27)
RAM: 52GB
CPU: 8 vCPU (skylake)
HD: 200 GB hdd
GPU: V100 x 2

Benchmarks:
Training: resnet34
learn.fit_one_cycle(4): Total time: 01:47 (single gpu)
learn.fit_one_cycle(4): Total time: 01:56 (dual gpu)

after Unfreezing, fine-tuning, and learning rates
learn.fit_one_cycle(1): Total time: 00:27 (single gpu)
learn.fit_one_cycle(1): Total time: 00:27 (dual gpu)

learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4)): Total time: 00:53 (single gpu)
learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4)): Total time: 00:54 (dual gpu)

Training: resnet50
learn.fit_one_cycle(5): Total time: 03:11 (single gpu)
learn.fit_one_cycle(5): Total time: 03:16 (dual gpu)

after Unfreeze:
learn.fit_one_cycle(1, max_lr=slice(1e-6,1e-4)): Total time: 00:44 (single gpu)
learn.fit_one_cycle(1, max_lr=slice(1e-6,1e-4)): Total time: 00:41 (dual gpu)

As you can see in this example, running multiple gpus for resnet34 did not improve performance. It performed about the same as a single.

P.S. : I run the notebook “As-is”. For a single gpu, I change nothing. To test dual gpu, I simply added “learn.model = torch.nn.DataParallel(learn.model, device_ids=[0, 1])” before fitting.