TabularLearner np.array predictions

MCool · August 3, 2022, 3:54pm

I’m trying to work with DALEX package. The package requires the prediction function to be Function that takes two parameters (model, data) and returns a np.ndarray (1d) with model predictions (default is predict method extracted from the model). NOTE: This function needs to work with data as pd.DataFrame. I know that learn.predict(df.iloc[0])[2] returns a tensor prediction. How can I get all the predictions in a 1d np.array? Thanks! I’ve been struggling for a while.

manojmohan · August 3, 2022, 6:20pm

You can refer this tutorial

Use the learn.get_preds() method to do batch prediction. Call numpy() method on the tensor to convert it.

MCool · August 4, 2022, 1:53am

Thanks! However, DALEX takes fastai’s prediction function and a dataframe as arguments. For .get_preds(), 1) DALEX need a dataframe to pass into fastai’s prediction function in a DALEX explainer and 2) the prediction function have to return 1d array. I haven’t figured out how to change the tensor tuple to be a 1d array, and it seems .get_preds() consumes a dataloaders, not a dataframe.

manojmohan · August 4, 2022, 10:25am

I was looking at this tensorflow example and tried to do the same in fastai.

from fastai.tabular.all import *
import pandas as pd
import dalex as dx

df = pd.read_csv("https://raw.githubusercontent.com/pbiecek/xai-happiness/main/happiness.csv", index_col=0)
X, y = df.drop('score', axis=1), df.score
splits = RandomSplitter(valid_pct=0.2)(range_of(df))
to = TabularPandas(df, procs=[Normalize], cont_names=list(X.columns), y_names='score', splits=splits)
dls = to.dataloaders(bs=16)
learn = tabular_learner(dls, layers=[12,8,12,6], metrics=[rmse,mae])
learn.fit_one_cycle(200)

def my_predict(model, data):
  tdl = model.dls.test_dl(data)
  return model.get_preds(dl=tdl)[0].squeeze().numpy()

explainer = dx.Explainer(learn, X, y, label='happiness', predict_function=my_predict)
explainer.model_performance()
explainer.model_parts().plot()