How to run prediction faster?

jerron · October 28, 2020, 3:25am

My PC is with AMD Ryzen 5 3600X, with 6 cores, no GPU. I trained my tabular regression in colab and I’m running the trained model in my pc. And it is slow. What make it even slower is the fact I found that it only use about 1/20 of the total CPU time. My code is like following:

for i in range(cnt):
    print("learner ", i , " : ", process_time() - start, time()-start1)
    res.append(learn_opt[i].get_preds()

and result is

learner  0  :  0.140625 0.14303159713745117
learner  1  :  1.328125 18.774247884750366
learner  2  :  2.390625 37.759544372558594
timeused: 3.484375 56.34675097465515

How to make my program use more cpu? I tried psutil.nice(psutil.HIGH_PRIORITY_CLASS) but I didn’t observe change.

Thanks!

mrfabulous1 · October 28, 2020, 9:57am

Hi jerron hope all is well!
You will have to search for possible bottlenecks in all of your system that is used to produce this line of code.

res.append(learn_opt[i].get_preds()

e.g. which part of your system is actually causing the delay, if it is disk access time trying to improve CPU utility won’t make a difference, changing from a slow hard disk to an ssd or running all the code in ram would make a difference.

The links below show two possible approaches to speeding up your code.

hope this helps mrfabulous1

s.s.o · October 28, 2020, 3:52pm

You can try bigger batch size for predictions

jerron · October 29, 2020, 12:34am

The learner is a fastai.tabular.tabular_learner. There should be no I/O bottleneck, purely calculation in memory, because I didn’t see any increased disk or network activities change in the Performance monitor from the Task Manager. When the calculation started, the CPU level increased to 25% from 10%%, memory increased to 45% from 40%, disk stayed at 0% or 1%, network stayed the same.

muellerzr · October 29, 2020, 12:46am

As @s.s.o change your used batch size. Use as large of one as your CPU can handle, this is how you use more. Tabular NN’s are tiny so you can send them large amounts of data with only a small overhead, especially on inference since we’re not storing gradients

jerron · October 29, 2020, 12:49am

I tried res.append(learn_opt[i].get_preds(ds_type=DatasetType.Valid,n_batch=100)) and res.append(learn_opt[i].get_preds(ds_type=DatasetType.Valid,n_batch=500)) and didn’t notice differences. What value would be good? There are 200 entries in prediction.

mrfabulous1 · October 29, 2020, 12:51pm

Hi jerron hope you are having a jolly day!

Unfortunately I don’t understand the fastai code enough to be know what to tweak to make it go faster.

As an experiment I would try running your code on google colab to see if it runs quicker on a different machine.

Also I have used the following code A hands on guide to multiprocessing in python on my Macbook Pro 2016 to make some of my own programs run quicker.

The examples in the tutorial are simple, clear and complete and on my machine take 18, 4 and 2 seconds.

hopefully you can use these with your code, hope this helps cheers mrfabulous1

jerron · October 29, 2020, 11:44pm

Yes, it runs very fast in colab – my model was trained there. When predicting, the same job will finish in 480ms.

%time res=learn2.get_preds(ds_type=DatasetType.Valid)
CPU times: user 237 ms, sys: 98.9 ms, total: 336 ms
Wall time: 480 ms

My own PC doesn’t have cuba GPU. it only come with Radeon RX 580. Waiting for black friday to get a RTX/GTX under $200