I was trying to setup a simple baseline model for the recent Kaggle Competition lish-moa . It is a problem of Multilabel Classification and while setting up my TabularPandas object, I came across 2 issues and have a few doubts.
I had a list dep_vars, which contained all the names of the dependent variables. But after setting up my TabularPandas object with options which I believe are correct, like, cat_names, cont_names and y_names, I tried checking my to.train.y values, which showed only the first column of dep_vars list.
to = TabularPandas(df, procs=procs, cat_names=cat, cont_names=cont,
y_names=dep_vars, splits=splits, device="cuda")
There is a column sig_id in the data, which I don’t include in neither cat_names nor cont_names but yet it shows up when I run to.items.head(5). And further, when I run fit_one_cycle on a tabular_learner, an error pops up saying RuntimeError: Found dtype Char but expected Float.
Since the evaluation metric is log_loss, I’m not sure what loss_func and metric to use. According to the course, we know its either nn.BCELoss() or nn.BCEWithLogitsLoss(), and I’ve seen people use the first, but I was confused because don’t we have to include the sigmoid function since the log_loss function requires the predicted probability?
Is it necessary for the categorical variables to be of type category before using it for a TabularPandas object? Or does the proc, Categorify do it for you?
And what is the use of n_out parameter in tabular_learner? And do we have to set it even after we mention the dep_vars while defining our TabularPandas object?
I just narrowed down my mistake to using the wrong loss_func, which should be BCEWithLogitsLossFlat(). But if anyone could explain why this would cause the error, RuntimeError: Found dtype Char but expected Float., that would be really helpful!