Problem in the predict function of Tabular Model

binga · March 30, 2020, 12:24pm

I am trying the Tabular learner and during predict phase, even if I input the same training data, the following learn.predict function throws up an error.

test = df.sample(n=100).reset_index(drop=True)
test_dl = learn.dls.test_dl(test.drop(['salary'], axis=1))
learn.predict(test_dl)

The error trace gist is here: https://gist.github.com/binga/2fabdc620a3a97b9d218b366df41b94c.

Am I missing any params or configuration settings?

P.S: This is on a Win10 machine.

P.S2: The corresponding github issue is here

fish · March 30, 2020, 12:42pm

I use the following function to get the predictions of an existing model on a new DataFrame:

def get_predictions(learn, df, cat_names, cont_names):
    tl = TabularList.from_df(df.copy(), path="./", cat_names=cat_names, cont_names=cont_names)
    learn.data.add_test(tl)
    preds, _ = learn.get_preds(DatasetType.Test)
    return preds

EDIT: This function works in fastai v1, I don’t yet have a version for fastai v2

binga · March 30, 2020, 12:52pm

Well, do we have a TabularList API? I don’t see it on the documentation (https://dev.fast.ai/tabular.core). Guess, I should check once again.

fish · March 30, 2020, 12:57pm

Oops. That function was written for FastAI v1, I hadn’t realized it no longer works in v2.

binga · March 30, 2020, 1:06pm

Oh, never mind. Really appreciate the quick response
It looks like predict has to be used for one row but get_preds has to be used if inferring on the entire test data.

More on the github issue link above.

muellerzr · March 30, 2020, 1:10pm

@binga yes you should use get_preds if using test_dl. Is there still any confusion or trouble?

binga · March 30, 2020, 1:30pm

Hey @muellerzr, I was using the incorrect dataloaders. Instead of using TabularDataLoaders, I was using DataLoaders to pack both my train and valid dataloaders. Sylvain helped me identify this issue.

More on this here - https://github.com/fastai/fastai2/issues/221

As I dig deeper, I think I found another issue when I try running this on the GPU. The github issue has the details. Do you think I’m missing anything?

P.S: Thank you for your amazing work to this community. I learnt the Tabular API using your notebooks here.

binga · March 30, 2020, 1:39pm

Re-posting here from the GitHub issue.

Using TabularDataLoaders fixes the problem.

However, I was interested in having a different batch size for train and test. So, I tried this.

trn_dl = TabDataLoader(to.train, bs=256, shuffle=True, drop_last=True)
val_dl = TabDataLoader(to.valid, bs=512)

dls = TabularDataLoaders(trn_dl, val_dl)

When I add device='cuda' in TabularPandas and use the above TabularDataLoaders, the model fails when I call learn.summary() or learn.lr_find().

Do you think I am missing anything?

An example gist is here - https://gist.github.com/binga/2fabdc620a3a97b9d218b366df41b94c

The model works fine when I do learn.model.to('cuda')
Looks like the user has to explicitly put the model on the GPU. Am I missing anything here?

muellerzr · March 30, 2020, 2:56pm

Best way to test this is to not explicitly say CUDA at all first and see how long it takes to validate on your data and compare the two. It should be pushed to the GPU if available during training etc IIRC

binga · March 30, 2020, 4:08pm

Oh yes. The model automatically picked the GPU. Thanks @muellerzr!