Adding my own measurments to the list (the low end…)
and who needs spreadsheets, we have markdown
My 1060 measurements closely mirror those of @Edward (but I was able to use bs=32)
Single GPU Benchmarks
Task | 1050 | 1060 | 1080 Ti | 2080 Ti | K80 | V100* | |
---|---|---|---|---|---|---|---|
GPU Mem | 4 GB | 6 GB | 11 GB | 11GB | 12GB | 16GB | |
CUDA - Driver | 9.2 - 396.54 | 9.2 396.54 | |||||
System | Dell XPS-15 | – | – | – | AWS p2.xl | Sagemaker p3.2xl | |
CPU | i7-7700HQ@2.8 | i5-7600K@3.8 | Intel 6850K | i7-7700K | E5-2686 | ? | |
RAM | 16GB | 16GB | 64GB | 32GB | 61GB | 61GB | |
Storage f. training data | Samsung NVMe | SSD | Samsung Nvme 960 | Samsung SM961 NVMe | SSD | SSD | |
OS | Ubuntu 18.04.1 | Ubuntu 18.04.1 | Ubuntu 18.04.1 | Ubuntu 18.04.1 | Ubuntu 16.04.5 | ? | |
resnet34 (bs=64) | fp | ||||||
F learn.fit_one_cycle(4) | 04:03 | 02:01 | 01:10 | 01:23 | 03:55 | 01:56 | |
U learn.fit_one_cycle(1) | 01:22 | 00:40 | 00:21 | 01:02 | 00:29 | ||
resnet50 (bs=48) | |||||||
F learn.fit_one_cycle(5) | 17:24¹ | 09:02² | 04:21 | 03:21 | 15:45 | 03:41 | |
F learn.fit_one_cycle(5) | fp16 | 02:46 | |||||
U learn.fit_one_cycle(1) sl(1e-6,1e-4) | 04:52¹ | 02:24² | 01:09 | 00:51 | 04:04 | 00:46 | |
U learn.fit_one_cycle(1) sl(1e-6,1e-4) | fp16 | 00:40 |
¹ bs=16
² bs=32
F = frozen, U = learn.unfreeze()
* V100 taken from the v3 lesson 1 (Jeremy runs ml.p3.2xl on sagemaker as can be seen in the video)