Multi label regression

mmr · January 11, 2020, 6:35am

(Edited)
Hi, I am training a dataset from Kaggle which is something like the following:

   var1    var2   var3 ... varN   target1 target2 
0   56      67      45      n1       12        0
1   14      15      41      n2       23        3
2   45      10       4      n3        5       35
3   23      12      34      n4       13       45
4   ...

It contains a dataframe with multiple target values. The target values and the variables are both continuous.
So far I have managed to do this:

df = pd.DataFrame(np.random.rand(10,8),columns = ['a','b','c','d','e','f','g','h'])
data = (TabularList.from_df(df,cont_names = ['a','b','c','d','e'])
                    .split_by_idx(valid_idx = range(8,10))
                    .label_from_df(cols=['f','g','h'],label_cls = FloatList)
                    .databunch(bs = 2)
        )
def accuracy_cast(input:Tensor, targs:Tensor)->Rank0Tensor:
    targs = targs.long()
    return accuracy(input, targs)
learn = tabular_learner(data, layers=[200,100], metrics=accuracy_cast)
learn.fit(5, 1e-2)

Is the format of the code okay. Am I missing something ? I used accuracy_cast function , as there was an issue with the target being float instead of long.

remapears · April 30, 2020, 4:47pm

Hi mmr,

I think this is a multi output regression problem as there are 2 outputs.

I am not familiar with accuracy_cast, though since this is a regression problem, you can use RMSE or MAE or MSE, each is used for a reason… I usually use MAE.

Hope this helps future users as this post is 4 months old now…