Problem in the predict function of Tabular Model

I am trying the Tabular learner and during predict phase, even if I input the same training data, the following learn.predict function throws up an error.

test = df.sample(n=100).reset_index(drop=True)
test_dl = learn.dls.test_dl(test.drop(['salary'], axis=1))
learn.predict(test_dl)

The error trace gist is here: https://gist.github.com/binga/2fabdc620a3a97b9d218b366df41b94c.

Am I missing any params or configuration settings?

P.S: This is on a Win10 machine.

P.S2: The corresponding github issue is here

I use the following function to get the predictions of an existing model on a new DataFrame:

def get_predictions(learn, df, cat_names, cont_names):
    tl = TabularList.from_df(df.copy(), path="./", cat_names=cat_names, cont_names=cont_names)
    learn.data.add_test(tl)
    preds, _ = learn.get_preds(DatasetType.Test)
    return preds

EDIT: This function works in fastai v1, I don’t yet have a version for fastai v2

Well, do we have a TabularList API? I don’t see it on the documentation (https://dev.fast.ai/tabular.core). Guess, I should check once again.

Oops. That function was written for FastAI v1, I hadn’t realized it no longer works in v2.

Oh, never mind. Really appreciate the quick response :slight_smile:
It looks like predict has to be used for one row but get_preds has to be used if inferring on the entire test data.

More on the github issue link above.

@binga yes you should use get_preds if using test_dl. Is there still any confusion or trouble? :slight_smile:

Hey @muellerzr, I was using the incorrect dataloaders. Instead of using TabularDataLoaders, I was using DataLoaders to pack both my train and valid dataloaders. Sylvain helped me identify this issue.

More on this here - https://github.com/fastai/fastai2/issues/221

As I dig deeper, I think I found another issue when I try running this on the GPU. The github issue has the details. Do you think I’m missing anything?

P.S: Thank you for your amazing work to this community. I learnt the Tabular API using your notebooks here.

Re-posting here from the GitHub issue.

Using TabularDataLoaders fixes the problem.

However, I was interested in having a different batch size for train and test. So, I tried this.

trn_dl = TabDataLoader(to.train, bs=256, shuffle=True, drop_last=True)
val_dl = TabDataLoader(to.valid, bs=512)

dls = TabularDataLoaders(trn_dl, val_dl)

When I add device='cuda' in TabularPandas and use the above TabularDataLoaders, the model fails when I call learn.summary() or learn.lr_find().

Do you think I am missing anything?

An example gist is here - https://gist.github.com/binga/2fabdc620a3a97b9d218b366df41b94c

The model works fine when I do learn.model.to('cuda')
Looks like the user has to explicitly put the model on the GPU. Am I missing anything here?

Best way to test this is to not explicitly say CUDA at all first and see how long it takes to validate on your data and compare the two. It should be pushed to the GPU if available during training etc IIRC

1 Like

Oh yes. The model automatically picked the GPU. Thanks @muellerzr!