Today(1/28/18), I decided to do a little benchmarking of the computation time it takes to run certain parts of notebooks. With paperspace and other cloud options the preferred options for running this course, I wanted a place where people could compare and contrast the different cloud options as well as setups created by students running their own “servers”. Does it make sense to run the class in the cloud? What about building your own server? What about the hardware you already have? My hope is that this is place where we can simply share observations. It is not about being faster (yet) it is just about showing options and how things may be relative.
For me, I do not dabble in the cloud computing simply because I have a custom built machine which does DL/ML very well. It is currently setup as a dual-boot win10/ubuntu system with fastai running natively. I have also recently purchased a laptop with an nvidia card for another purpose. Today I thought it would be good to see how the setups compare, starting with the two main learn.fit operations in the lesson1 notebook.
For setup purposes, I pulled the latest fastai repo, and also did a conda env update. Here are my results:
While not a complete apples to apples test, the laptop came in last place, even with its 6GB 1060 card.
The custom built desktop with the 1080ti was more than 50% faster than the laptop. I was quite surprised.
The same machine booted into Ubuntu was 40% faster than windows! Now it could be that ubuntu runs on an Nvme drive while Win10 runs on a SSD, or it could be drivers, but I was impressed.
As time allows, I will add tabs for each notebook with the computational operations identified. At any rate, I have put my results into google slides. If anyone would like to contribute to the slides send me a msg, and I will share the link.
Thanks, I’ll post my results relative to that part by tomorrow. I imagine that with pytorch using the gpu I should attain something in between your laptop and your monster rig (given that you are using just one 1080ti). I’ll keep you posted.
For the moment, let me say that I obtained the result I posted on a nb I written down from scratch (as I usually do). The same code executed on the standard nb provided by fastai repo is carried out almost instantaneously, the first time you execute it. I’m clearly making some stupid mistake, having slept 6 hours in 3 days. My best guess is that the gpu is initialized by pytorch, but it is not released once it finishes its job. Either that, or I’m suffering memory leakages
i7 Haswell, GTX 1070, 16gb, Windows 10, no load on gpu, but a lot of load on the cpu.
3/3 [05:41<00:00, 113.99s/it]
Considering that the gtx 1080 ti did it in 4:16 when on win, my wall time seems ok.
The one thing I find most interesting is the big discrepancy of @FourMoBro with windows and Linux.
It cannot be entirely (or even partly) attributed to the nvme ssd, a fortiori because he has 64gb of ram (I exclude swapping operations).
Here is my bench mark data on different batch size for lesson1 cats and dog classification. I’ve used the epoch of 5. Not sure if this is the correct parameter to benchmark it against. Any inputs will help.
Batch Size
trn_loss
val_loss
Accuracy
Wall Time (seconds)
64
0.031134
0.028481
0.989
15.7
128
0.028619
0.029348
0.989
14.1
256
0.032689
0.022995
0.991
13.2
512
0.038162
0.025427
0.9895
12.7
1024
0.055639
0.02597
0.988
12.2
2048
0.08693
0.034631
0.987
11.4
4096
0.165338
0.048062
0.983
11.5
8192
0.303578
0.060767
0.9795
10.1
16384
0.346356
0.091748
0.98
6.15
32768
0.651255
0.262653
0.927
4.66
65536
0.676977
0.250999
0.9475
4.74
131072
0.56841
0.24005
0.9415
4.73
I stoped at batch size 131072 . Not sure how much more load my GPU can take. But before that I need to know, if I’m on the right track of benchmarking.