Let’s try to keep this thread on topic…
The purpose of this thread is to share your experience running the “lesson-1-pets.ipynb” jupyter notebook on various platforms. Many people are eager to get their local servers up and running or even building a brand new box with the latest hardware. Others may prefer to use a paid cloud option. Either way, it costs money upfront to build a local server, or you will pay as you go with a cloud option (once credits run out). Hopefully, in the end, people will see which platform suits them the best for their current situation. All I am asking for is for people to share their processing times for various sections of the notebook, that is it. This is not an install help thread, nor a “what does this do” thread, nor is it a “mine is better” thread.
If you have a local server, please list the relevant components. For the cloud options, which configurations you chose, etc.
I will get this started:
I have a local server, here are the specs:
OS: Ubuntu 18.04.1 LTS
RAM: 64GB
CPU: Intel 6850K
HD: Samsung Nvme 960
GPU: 1080ti x 2
Benchmarks:
Training: resnet34
learn.fit_one_cycle(4): Total time: 01:10 (single gpu)
learn.fit_one_cycle(4): Total time: 01:12 (dual gpu)
after Unfreezing, fine-tuning, and learning rates
learn.fit_one_cycle(1): Total time: 00:21 (single gpu)
learn.fit_one_cycle(1): Total time: 00:19 (dual gpu)
learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4)): Total time: 00:42 (single gpu)
learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4)): Total time: 00:37 (dual gpu)
Training: resnet50
learn.fit_one_cycle(5): Total time: 04:21 (single gpu)
learn.fit_one_cycle(5): Total time: 03:03 (dual gpu)
after Unfreeze:
learn.fit_one_cycle(1, max_lr=slice(1e-6,1e-4)): Total time: 01:09 (single gpu)
learn.fit_one_cycle(1, max_lr=slice(1e-6,1e-4)): Total time: 00:46 (dual gpu)
As you can see in this example, running multiple gpus for resnet34 did not improve performance. It performed about the same as a single. For resnet50, dual gpus was about 25% faster.
Please share your cloud experiences or your local server experience if you have one. Overtime, myself or someone else will create a spreadsheet to track them all.
Thanks again.
edit: I run the notebook “As-is”. For a single gpu, I change nothing. To test dual gpu, I simply added “learn.model = torch.nn.DataParallel(learn.model, device_ids=[0, 1])” before fitting. If you make any deviations in the code, please note them.