Why does parallell_trees takes much longer than model.predict?


I’ve trained a RF with 200 estimators and I’m trying to find the optimal number of trees. I’m doing it with the parallel_trees function that @jeremy explains in Lesson 3, like:

preds = np.stack(parallel_trees(model, get_preds))
This takes about 1 minute (I have 230k rows). However, when I call model.predict(train_x), it takes just 5 seconds.

Why does this happen? Aren them doing the same, internally? Passing the 230k rows trough each one of the 200 trees and being the final averaging of the results the only difference? Both cases are parallelized (8 cores) so where’s the difference? Some kind of vectorization maybe?


They are not doing the same thing.

preds = np.stack(parallel_trees(model, get_preds))

actually does two things:

  1. It calls get_preds which does m.predict(X_train) --> This is what you are comparing with
  2. And, it gets the prediction from each individual trees and stacking them into an array

Correct me if I’m wrong, but looking at the code:

def get_preds(t): 
    return t.predict(X_valid)

get_preds calls t.predict (each individual tree of the forest). So that would essentially pass each one of the rows trough each one of the trees, in parallell (because n_jobs=-1). Quite the same thing as with m.predict, isn’t it?

What am I missing? :thinking:

So you mean np.stack used to stack the nump array with individual tree prediction should have essentially zero cpu and complete in 0 ms?

Indeed, most of the time is spent on the parallell_trees function. Then stacking together the arrays of a list is quick:


My question is, why m.predict takes much less if the two functions are parallelized? The only difference I see is that m.predict does that the other doesn’t is averaging the matrix of predictions, but that’s sums and divisions, which are fast too.