I am trying the Tabular learner and during predict phase, even if I input the same training data, the following
learn.predict function throws up an error.
test = df.sample(n=100).reset_index(drop=True)
test_dl = learn.dls.test_dl(test.drop(['salary'], axis=1))
The error trace gist is here: https://gist.github.com/binga/2fabdc620a3a97b9d218b366df41b94c.
Am I missing any params or configuration settings?
P.S: This is on a Win10 machine.
P.S2: The corresponding github issue is here
I use the following function to get the predictions of an existing model on a new DataFrame:
def get_predictions(learn, df, cat_names, cont_names):
tl = TabularList.from_df(df.copy(), path="./", cat_names=cat_names, cont_names=cont_names)
preds, _ = learn.get_preds(DatasetType.Test)
EDIT: This function works in fastai v1, I don’t yet have a version for fastai v2
Well, do we have a TabularList API? I don’t see it on the documentation (https://dev.fast.ai/tabular.core). Guess, I should check once again.
Oops. That function was written for FastAI v1, I hadn’t realized it no longer works in v2.
Oh, never mind. Really appreciate the quick response
It looks like
predict has to be used for one row but
get_preds has to be used if inferring on the entire test data.
More on the github issue link above.
@binga yes you should use get_preds if using test_dl. Is there still any confusion or trouble?
Hey @muellerzr, I was using the incorrect dataloaders. Instead of using
TabularDataLoaders, I was using
DataLoaders to pack both my train and valid dataloaders. Sylvain helped me identify this issue.
More on this here - https://github.com/fastai/fastai2/issues/221
As I dig deeper, I think I found another issue when I try running this on the GPU. The github issue has the details. Do you think I’m missing anything?
P.S: Thank you for your amazing work to this community. I learnt the Tabular API using your notebooks here.
Re-posting here from the GitHub issue.
TabularDataLoaders fixes the problem.
However, I was interested in having a different batch size for train and test. So, I tried this.
trn_dl = TabDataLoader(to.train, bs=256, shuffle=True, drop_last=True)
val_dl = TabDataLoader(to.valid, bs=512)
dls = TabularDataLoaders(trn_dl, val_dl)
When I add
TabularPandas and use the above
TabularDataLoaders, the model fails when I call
Do you think I am missing anything?
An example gist is here - https://gist.github.com/binga/2fabdc620a3a97b9d218b366df41b94c
The model works fine when I do
Looks like the user has to explicitly put the model on the GPU. Am I missing anything here?
Best way to test this is to not explicitly say CUDA at all first and see how long it takes to validate on your data and compare the two. It should be pushed to the GPU if available during training etc IIRC
Oh yes. The model automatically picked the GPU. Thanks @muellerzr!