Tabular model always predicting 0 label, but only on test set

osman · April 10, 2019, 10:37pm

Hey,
I’m trying just a standard pretty vanilla fast.ai tabular model on the Santander Kaggle competition… I’m not trying to compete, just trying out fast.ai

The problem I’m facing is that after training my model and running it against the test_data, it always predicts 0 label (you can see from my link that on the validation set, there are both 1’s and 0’s).

Am I missing something obvious?

Link: https://www.kaggle.com/osmano/fast-ai-tabular-library/notebook

sgugger · April 10, 2019, 10:52pm

The link is dead so it’s not easy to see what’s going on. If you’re talking about the return from get_preds it returns (tensor predictions,targets), so it’s normal you get a bunch of zeros for the second one since you have no labels on the test set.

osman · April 10, 2019, 11:10pm

You’re right, I didn’t realize get_preds was returning the targets (not the actual label predictions). It’s a little tricky that it gives me back a set of zero “targets” (I’d assume they’d be null’d or something).

(Sorry about the link - I had screwed up permissions and it was private… so I just fixed that).

Thanks for clarifying! I dug around for other threads and it looks like people are just argmax’ing over the tensor probabilities to infer the class?

sgugger · April 10, 2019, 11:35pm

That would probably be the right thing to do, yes.

ajthinking · April 11, 2019, 2:25am

Hi @osman!
Funny, I had the same idea, and the exact same problem and solved it in a similar way like sgugger suggest.

probs = learn.get_preds(ds_type=DatasetType.Test)[0]

def probs2class(item): 
    return max(range(len(item)), key=item.__getitem__) 

test_df = pd.DataFrame({'ID_code': df_test['ID_code'], 'target': list(map(probs2class, probs))})

On a side note, I searched a long time for a prepared helper method probs2class or similar since this is such a common thing all classification problems need in the end, right? Im surprised there does not exist such a function in the fast ai library (?). Putting a lookup function in the code like I did is kind of messing the flow up, and doing argmax and then mapping into a classes variable (it might be strings) would not be good for fast readablity. Id love to contribute such a function but need some more experience coding to do that. Or maybe there are reasons why it does not exists??

Herman · May 21, 2019, 2:52pm

After hours and hours of searching - yours was the most comprehensive answer to this problem. You also raise an important question - why this is not default! Thanks though!

aanghosh17 · September 9, 2019, 7:39am

Hi, I noticed a similar problem for my models, is there any fastai function that lets me map labels to the argmax outputs? To put it into context of the teddy bear classifier, is there any way to check how outputs [0,1,2] would map to [‘teddy bear’, ‘black bear’, ‘brown bear’]? Is there an inbuilt dictionary inthe fastai library that I can access for this purpose?

I appreciate any help in this regard.

sgugger · September 9, 2019, 1:29pm

The labels are stored in data.classes if data is your DataBunch.