Tabular data Lesson 4 question

max3xis · July 17, 2019, 8:12pm

Hi there! Unfortunately, the info in tabular data example is insufficient for me. like in the given example we predict whether the person has income over 50k. But what if there are very few of them and there is a very small chance that he has that income?
Is there a way of seeing what is the probability of it in this case?

muellerzr · July 17, 2019, 8:19pm

So you’re saying if he (some individual) is part of a small portion of x group, how will the model treat it? Correct?

max3xis · July 17, 2019, 8:27pm

yes. as I understand, the model should see it as a probability of >50 compared to <50 . so for example, If im trying to predict how many of them go bankrupt the next year, there is a probability for every person, but there is no chance that the model will guess. so Im wondering how to deal with these small percentages and basically want to build something like a scoring model to say there is a 2% probability of this one going bankrupt and 4% for someone who is married for example.

muellerzr · July 17, 2019, 8:45pm

The closest thing is to look at the variable relationships, otherwise restructure the problem to take account for that. I’m working on a tool that can help with the first (looking at the confusion matrix) but if we analyze the distributions of people who make <50 or >50 that could help answer some of your questions. Other’s may chime in too.

max3xis · July 18, 2019, 11:53pm

Finally solved all of the issues, thx ) I didnt figure out some things at first. Do you know if the structure of the NN is explained anywhere? like how to understand how it looks from the inside and what are those 200 * 100 parameters and how do they attach to our attributes?

muellerzr · July 19, 2019, 11:53pm

Consider the 200,100 our hidden layers. You can do learn.model to look at the model itself. Otherwise source code and the docs for the tabular models

max3xis · August 15, 2019, 5:48am

yepp, didnt think of that, thanks )

AjayStark · August 18, 2019, 2:51pm

Hi, interesting question @max3xis
How were you able to solve it, i.e. prediction on bankruptcy

AjayStark · August 18, 2019, 2:56pm

Hii @muellerzr,
In lesson-4 tabular, the model is used to predict ‘salary’ which is a categorical variable and we use tabular learner for the purpose.
But what if, i want to predict a continuous variable, say age, is it possible to predict it? maybe using regression …

thanks

muellerzr · August 18, 2019, 3:00pm

@AjayStark Absolutely! Look at the Rossmann notebook from lesson 6 to see how to set that up

max3xis · August 19, 2019, 2:11am

yes, if you simply feed true/false values you later look at the probabilty of those statuses.

AjayStark · August 19, 2019, 5:58pm

Ohhh, Thanks @muellerzr