TabularModel producing 2 output features where there is only one dependent var

I am having an issue with TabularModel which is producing 2 output features where there is only one dependent variable. This is obviously failing while flattening it. Any help is appreciated.

TabularModel(
(embeds): ModuleList(
(0): Embedding(180001, 600)
)
(emb_drop): Dropout(p=0.05)
(bn_cont): BatchNorm1d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(layers): Sequential(
(0): Linear(in_features=620, out_features=1600, bias=True)
(1): ReLU(inplace)
(2): BatchNorm1d(1600, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): Dropout(p=0.001)
(4): Linear(in_features=1600, out_features=800, bias=True)
(5): ReLU(inplace)
(6): BatchNorm1d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): Dropout(p=0.05)
(8): Linear(in_features=800, out_features=200, bias=True)
(9): ReLU(inplace)
(10): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): Dropout(p=0.02)
(12): Linear(in_features=200, out_features=2, bias=True)
)
)

Hello
Could you please provide some more information about your data. Maybe a sample row from your input dataframe, and an example of your independent variable (for ex. what type is it).

3 rows of sample data and 2nd column is the dependent variable. They are either 0 or 1

199759 1 10.5237 4.7965 10.1304 9.3253 9.4114 -4.7457 6.2351 10.8429 -1.4616 9.4703 -2.1009 -4.6201 14.0316 5.7248 7.4399 14.7922 12.3770 -8.5905 9.7220 20.5816 25.1204 7.7466 3.5452 3.2727 4.6172 13.8215 -14.4892 -1.4333 6.1524 6.2017 -0.5184 9.4584 2.8067 22.1086 10.7254 6.9576 1.5358 2.9775 16.1798 -1.1581 -9.4023 6.9065 10.4266 11.6113 5.8958 8.4600 14.2394 -26.9857 16.8233 25.0129 11.2034 4.9862 -8.4792 6.5437 -3.8409 3.2423 18.4656 6.8385 7.2955 9.7975 16.7350 -18.2589 4.7195 3.8674 5.5355 -2.9590 5.1583 9.6265 5.0186 -9.7849 42.8603 0.3008 2.1449 19.3360 16.4567 19.0344 10.5787 18.2526 4.7830 12.9492 2.5682 11.8811 2.3900 1.2092 -8.5651 12.2751 6.5873 9.4599 7.9783 3.5988 -10.4353 7.0910 15.5506 10.6520 11.2937 0.1032 12.1025 34.4568 3.0208 0.2313 4.6557 9.3888 23.6781 1.4542 10.2024 2.6975 9.9136 20.0642 14.0350 14.1472 2.5999 7.0600 3.2345 14.7091 2.4197 -0.2239 2.0687 4.7176 3.5745 -0.6136 3.7944 8.7308 6.3075 -7.2907 3.7128 12.3968 14.2371 1.2334 -5.8172 14.8453 14.2436 0.8166 6.0171 6.7294 -12.0477 2.7692 32.1025 5.1964 -5.7844 -2.8926 4.8620 8.0116 17.3674 11.2760 9.7258 0.8684 8.4282 -12.8874 4.1550 -0.2196 13.0773 12.0120 6.4300 13.6476 9.0520 0.9331 12.1470 -0.1096 3.7800 12.4133 39.8184 5.4111 3.7248 17.5192 -3.1267 9.6603 2.3530 7.7533 6.2000 5.6624 -4.2587 -5.2835 1.1933 4.7467 15.0927 8.2711 2.8916 13.9755 -8.5618 4.1535 8.7149 11.4382 -8.6374 5.8060 3.9588 -0.8331 5.6804 -17.1862 22.9891 -0.0581 13.6381 14.4864 2.3609 -0.8490 18.8579 0.4579 0.6325 7.0770 16.4903 -20.4578
199760 0 8.3461 -3.1356 10.9140 5.0295 14.4948 -6.0191 5.1814 15.6450 -4.2764 7.4871 -4.7391 5.1473 13.8109 8.6541 9.4811 14.6466 10.7758 -0.3142 21.8396 19.3036 18.1135 20.9583 6.7705 4.3217 6.2860 13.2111 -10.5455 -0.5241 5.6200 7.0240 -2.2362 9.2784 1.2067 14.7719 11.2687 4.6927 -0.9582 8.1340 14.4130 -3.9602 -12.7266 11.3982 10.3216 11.2978 11.4086 -5.8340 11.5448 8.3414 -3.0380 24.6373 11.8865 0.6495 0.1393 5.5449 -6.3090 12.5356 18.7714 5.9731 3.7241 8.9240 11.4649 -9.9421 1.8290 3.4680 5.2379 -3.7096 7.6942 18.8967 5.0161 3.4422 18.5819 0.8194 2.1073 32.6567 -5.0406 27.4469 6.1047 19.5315 7.0468 13.9194 13.3114 15.9262 -7.1629 3.7812 5.2883 19.2012 7.4238 2.0110 9.1678 3.7255 -11.2703 6.7968 12.6544 9.7944 7.1502 0.0096 10.4463 14.0702 1.8231 -2.9997 6.0174 14.8245 23.2611 1.7222 10.3610 3.4597 10.7459 10.8883 14.1163 17.3206 7.7394 6.1359 2.2826 2.1927 3.1056 -0.8435 3.2765 24.5159 -10.9072 -0.6803 6.1042 8.7995 -10.2294 4.0570 -0.9538 11.8579 12.7062 -5.3076 2.3931 13.2173 10.3377 0.1286 8.6889 6.2344 -14.4685 2.4922 9.9062 17.2767 10.4170 13.9907 -4.7442 9.4279 9.3616 11.9469 9.5571 9.6445 13.8724 2.2456 4.1596 6.2815 18.2727 10.1354 13.4114 14.9572 -5.1819 -9.3249 13.4090 -4.1721 11.2165 11.1516 26.0916 5.6182 6.0020 9.8993 -0.2080 8.4244 3.2761 -0.8841 5.3507 5.5664 -9.8875 -4.3540 27.8757 2.0015 32.1945 8.9980 -15.6517 11.6968 4.3392 6.2289 -3.1094 11.9303 12.4121 10.4848 -0.6749 -5.3948 4.7786 -28.0623 22.7568 0.6868 -1.2858 7.8473 3.3776 1.7575 10.7121 2.6265 7.0773 10.0061 21.7852 -7.8623
199761 0 10.8498 -6.9508 13.1740 8.0891 12.3278 -2.9784 4.7439 23.5438 -1.2621 7.6835 2.7797 -10.6695 14.1603 9.3403 8.6705 14.3257 10.6320 -14.6474 8.4799 3.4582 14.3815 22.3142 4.4066 2.3600 8.3958 13.2908 -12.1721 1.8667 6.3986 3.4871 -11.9549 9.5673 1.1376 13.6852 11.4322 4.6172 1.5329 5.9929 17.9770 5.6876 -10.1278 11.5141 12.0657 11.2494 12.4778 12.3242 9.9910 -8.6607 23.2893 1.2670 12.0163 11.8079 -3.0246 5.6209 3.8826 4.2991 12.6297 6.5911 5.9098 8.3813 16.2072 -28.3145 1.8893 2.7073 3.1876 1.2894 4.6428 11.2038 5.0318 -0.3100 3.9977 0.3938 -6.6687 30.8452 31.3567 22.9279 -1.7071 15.4288 7.2147 13.0744 -0.3120 8.9416 -4.3711 2.7874 -8.5865 14.4090 -12.5083 11.4882 10.2478 5.6173 -10.7580 6.8355 8.7921 10.6781 6.6052 -0.1883 11.5255 34.1852 2.1394 -0.7372 -3.7996 9.6126 14.9353 1.3596 11.5170 4.0420 12.8756 13.6825 13.9794 14.0735 6.4153 6.5081 4.0648 11.2209 4.8287 2.6320 2.2643 13.7816 2.6461 7.8225 20.2349 13.9471 -4.3976 10.3643 0.6116 12.8611 13.7546 5.2321 -6.3378 14.0503 12.1412 0.7079 6.6173 6.4151 -16.4866 8.4230 9.1862 21.5845 -0.2025 2.3555 2.3826 5.8985 17.6952 12.5947 7.7304 0.8236 13.5120 9.9095 4.0899 20.1796 15.9531 11.6309 6.5661 18.6567 11.2570 -0.0409 14.4745 -1.5392 11.5385 8.9826 17.3912 5.3592 5.1994 4.7310 -3.0477 18.0006 3.2506 -12.5815 7.2778 4.8310 6.4867 -0.6872 16.2197 -12.4416 26.0754 7.2304 3.1712 15.5908 8.2826 -1.3305 -1.5021 8.3902 2.8458 8.4630 -0.8319 -7.4646 8.4976 -40.3805 14.1717 -0.9750 2.8182 4.7460 3.1904 -3.2225 21.4370 -0.5573 6.9020 9.7923 15.5002 -2.8602

If I make changes to .label_from_df(cols=dep_var, label_cls=FloatList, log=True), it gives me 1 out_features but this is an int type and I don’t want to log. I tried setting label_cls to CategoryList but no luck and data.c still gives me 2.

#yrange
y_range = torch.tensor([0,1], device=defaults.device)
learn = tabular_learner(data, layers=[1600,800,200], ps=[0.001,0.05,0.02], emb_drop=0.05,metrics=rmse,y_range=y_range)

1 Like

Yes, if you have a categorical dependent variable that can have two values, it’s perfectly normal that your output has two values (which will be the probabilities of being 0 and 1).

This seems similar to the problem I had TabularList: Training problems between CategoryList and Floatlist

So if we have a binary classifier (0 or 1) we should use:

.label_from_df(cols=dep_var, label_cls=Categorylist)
then the first number be the probability of 0 and second the probability of 1?

That’s correct.