Hi there! Unfortunately, the info in tabular data example is insufficient for me. like in the given example we predict whether the person has income over 50k. But what if there are very few of them and there is a very small chance that he has that income?

Is there a way of seeing what is the probability of it in this case?

So you’re saying if he (some individual) is part of a small portion of x group, how will the model treat it? Correct?

yes. as I understand, the model should see it as a probability of >50 compared to <50 . so for example, If im trying to predict how many of them go bankrupt the next year, there is a probability for every person, but there is no chance that the model will guess. so Im wondering how to deal with these small percentages and basically want to build something like a scoring model to say there is a 2% probability of this one going bankrupt and 4% for someone who is married for example.

The closest thing is to look at the variable relationships, otherwise restructure the problem to take account for that. I’m working on a tool that can help with the first (looking at the confusion matrix) but if we analyze the distributions of people who make <50 or >50 that could help answer *some* of your questions. Other’s may chime in too.

Finally solved all of the issues, thx ) I didnt figure out some things at first. Do you know if the structure of the NN is explained anywhere? like how to understand how it looks from the inside and what are those 200 * 100 parameters and how do they attach to our attributes?

Consider the 200,100 our hidden layers. You can do learn.model to look at the model itself. Otherwise source code and the docs for the tabular models

yepp, didnt think of that, thanks )

Hii @muellerzr,

In lesson-4 tabular, the model is used to predict ‘salary’ which is a categorical variable and we use tabular learner for the purpose.

But what if, i want to predict a continuous variable, say age, is it possible to predict it? maybe using regression …

thanks

yes, if you simply feed true/false values you later look at the probabilty of those statuses.