Multiple Tabular Output Variables?

I am working with a dataset where I am wanting to have two variables inputted and two variables outputted (the same variables), but when I go to predict all it outputs is one.

This is how I create my databunch object and learner:
test = TabularList.from_df(df.iloc[2:100].copy(), path=PATH, cat_names=cat_names, cont_names=cont_names)

data = (TabularList.from_df(df, path=PATH, cat_names=cat_names, cont_names=cont_names, procs=procs) .split_by_idx(list(range(2,100))) .label_from_df(cols=all_names) .add_test(test) .databunch())

learn = tabular_learner(data, layers=[200,100], metrics=F.mse_loss)

I am not 100% sure on this type of problem that my metric is right as well. I know I cannot use accuracy (even thought that’s really what I want), so any advice to the right metric I am absolutely open to.

learn.predict(row)

Output:
(MultiCategory x;y, tensor([1., 1.]), tensor([0.5043, 0.5197]))

cat_names is blank, and cont_names has two variables, ‘x’ and ‘y’. I have tried label_from_df where cols = [‘x’,‘y’] however this just has a blank output as shown. Any advice? And I am trying to build a tabular autoencoder, for those wondering.

Thanks,

Zach

You’ve a two dimensional regression problem: be sure that your label column Y contains an array of two items.

1 Like

Thanks @ste ! So if I were to want to increase it. Say my input was 42 elements and I want my output to be 42 elements, I would have an array of the 42 category names? E.g. label_from_df(cols=all_names)

Where all_names is an array 42 long with the categories listed as before?

True. Bare in mind that the number of your input can be different than the number of your output.

Be sure that the labels items are float so you tell the system that you’re asking for regression.
If you was addressing a classification problem, items should have been Integers.

learn.predict(row) where row is df.iloc[1] just to predict out some data already known, it will only do a MultiCategory followed by the names of each Category, followed by the tensor sizes.

What I am attempting to do is closer to this:

data = (TabularList.from_df(df, path=PATH, cat_names=cat_names, cont_names=cont_names, procs=procs)
       .split_by_idx(list(range(0,50)))
       .label_from_df(TabularList.from_df(df, path=PATH, cat_names=cat_names, cont_names=cont_names, procs=procs))
       .add_test(test)
       .databunch())

Where my output is exactly like my input. I can get this to somewhat? Working somewhat with a FloatList passed into label_from_df such as so:

data = (TabularList.from_df(df, path=PATH, cat_names=cat_names, cont_names=cont_names, procs=procs)
       .split_by_idx(list(range(0,50)))
       .label_from_df(cont_names, label_cls=FloatList, log=False)
       .add_test(test)
       .databunch())

I can successfully make the numbers match up in a floatList, without applying any transforms to the data, however my FloatItem list it pops up at the end is: FloatItem [-0.039882 -0.039267 0.107798 -0.05002 ], all of these values should be well above zero, what is causing this issue?

This is likely a result of normalization via the ItemList processors.

Would I need to run a denorm to some degree?

If you don’t want to normalize you would need to define your own processing pipeline.

Take a look at the source code and docs for how to do it … but essentially, it seems to me that you don’t want normalization for your target values. Give it a try and let us know if you have success.