Fastai notebook computation time benchmarks


Today(1/28/18), I decided to do a little benchmarking of the computation time it takes to run certain parts of notebooks. With paperspace and other cloud options the preferred options for running this course, I wanted a place where people could compare and contrast the different cloud options as well as setups created by students running their own “servers”. Does it make sense to run the class in the cloud? What about building your own server? What about the hardware you already have? My hope is that this is place where we can simply share observations. It is not about being faster (yet) it is just about showing options and how things may be relative.

For me, I do not dabble in the cloud computing simply because I have a custom built machine which does DL/ML very well. It is currently setup as a dual-boot win10/ubuntu system with fastai running natively. I have also recently purchased a laptop with an nvidia card for another purpose. Today I thought it would be good to see how the setups compare, starting with the two main operations in the lesson1 notebook.

For setup purposes, I pulled the latest fastai repo, and also did a conda env update. Here are my results:

While not a complete apples to apples test, the laptop came in last place, even with its 6GB 1060 card.
The custom built desktop with the 1080ti was more than 50% faster than the laptop. I was quite surprised.
The same machine booted into Ubuntu was 40% faster than windows! Now it could be that ubuntu runs on an Nvme drive while Win10 runs on a SSD, or it could be drivers, but I was impressed.

As time allows, I will add tabs for each notebook with the computational operations identified. At any rate, I have put my results into google slides. If anyone would like to contribute to the slides send me a msg, and I will share the link.

Community Results:

Personal DL box
Making your own server
Howto: installation on Windows
Howto: installation on Windows
(Andrea de Luca) #2

Thanks, I’ll post my results relative to that part by tomorrow. I imagine that with pytorch using the gpu I should attain something in between your laptop and your monster rig (given that you are using just one 1080ti). I’ll keep you posted.

For the moment, let me say that I obtained the result I posted on a nb I written down from scratch (as I usually do). The same code executed on the standard nb provided by fastai repo is carried out almost instantaneously, the first time you execute it. I’m clearly making some stupid mistake, having slept 6 hours in 3 days. My best guess is that the gpu is initialized by pytorch, but it is not released once it finishes its job. Either that, or I’m suffering memory leakages

Titan V: a Tesla V100 PCIe for $3000
(Bryan Daniels) #3
  • GPU: Titan V

  • Rig: Desktop, i7

  • Operating System: Ubuntu 16.04

  • RAM: 32 G

  • User: @prairieguy

  • Date: 1/29/2018

  • Data Augmentation:, 3, cycle_len=1), no-load( only): 2:01 min

  • Fine Tuning:, 3, cycle_len=1, cycle_mult=2), no-load( only): 8:19 min

(Bryan Daniels) #4
  • GPU: Titan Xp

  • Rig: Desktop, i7

  • Operating System: Ubuntu 16.04

  • RAM: 32 G

  • User: @prairieguy

  • Date: 1/31/2018

  • Data Augmentation:, 3, cycle_len=1), no-load( only): 2:51 min

  • Fine Tuning:, 3, cycle_len=1, cycle_mult=2), no-load( only): 13:16 min

(Andrea de Luca) #5

Uhm, the Titan V is 30% faster than Titan Xp, even without using tensor cores.

However, great rig!

(Andrea de Luca) #6

My benchmark on, 3, cycle_len=1)

i7 Haswell, GTX 1070, 16gb, Windows 10, no load on gpu, but a lot of load on the cpu.

3/3 [05:41<00:00, 113.99s/it]

Considering that the gtx 1080 ti did it in 4:16 when on win, my wall time seems ok.

The one thing I find most interesting is the big discrepancy of @FourMoBro with windows and Linux.
It cannot be entirely (or even partly) attributed to the nvme ssd, a fortiori because he has 64gb of ram (I exclude swapping operations).


GPU: Titan Xp

Rig: AMD 1950x

Operating System: Ubuntu 16.04, kernel 4.15

RAM: 64 G

Disk: Samsung 960 Pro NVME

Date: 2/19/2018

Data Augmentation:, 3, cycle_len=1) 1:56 min

Fine Tuning:, 3, cycle_len=1, cycle_mult=2): 10:27 min