Can't create tabular_learner

jerron · February 29, 2020, 3:47am

When I run following code, I got error

from fastai2.tabular.all import *
df = pd.read_csv('/content/gdrive/My Drive/myinput.csv')
np.random.seed(88)
dls = TabularDataLoaders.from_df(df, path='.', 
    cat_names=cat_names, cont_names=cont_names, procs=[Categorify, Normalize],
    y_names='target', valid_idx=list( np.random.permutation(len(df))[:int(len(df)*.1)]), bs=64)
learn = tabular_learner(dls, layers=[2000,2000,500,200,50], metrics=accuracy)

error message:

AssertionError Traceback (most recent call last)
  8     cat_names=cat_names, cont_names=cont_names, procs=[Categorify, Normalize],
  9     y_names='prempd', valid_idx=list( np.random.permutation(len(df))[:int(len(df)*.1)]), bs=64)
—> 10 learn = tabular_learner(dls, layers=[2000,2000,500,200,50], metrics=accuracy)
11 learn.fit_one_cycle(3)

/usr/local/lib/python3.6/dist-packages/fastai2/tabular/learner.py in tabular_learner(dls, layers, emb_szs, config, n_out, y_range, ps, embed_p, use_bn, bn_final, **kwargs)
31 emb_szs = get_emb_sz(dls.train_ds, {} if emb_szs is None else emb_szs)
32 if n_out is None: n_out = get_c(dls)
—> 33 assert n_out, "n_out is not defined, and could not be infered from data, set dls.c or pass n_out"
34 model = TabularModel(emb_szs, len(dls.cont_names), n_out, layers, y_range=y_range,
35 ps=ps, embed_p=embed_p, use_bn=use_bn, bn_final=bn_final, **config)

AssertionError: n_out is not defined, and could not be infered from data, set dls.c or pass n_out

how to fix?

sgugger · February 29, 2020, 5:36am

As the error meassage tells you: either set dls.c or pass n_out to indicate how many classes you have.

This is the sign that the fastai library did not understand your data. You might want to pass a block_y in your call to DataLoaders to indicate what type of targets you have.

jerron · February 29, 2020, 4:32pm

Thank you. What exactly shall I do? What does “classes” mean here, or what is dls.c?

muellerzr · February 29, 2020, 4:36pm

Classes is the cardinality of the number of outputs your expecting (0,1 if binary, or labeled classes)

What may be happening are your y’s are already encoded (0,1, etc) and so fastai is picking it up as regression. So what the second part means,

Means you should do:

TabularDataLoaders.from_df(df, path='.', 
    cat_names=cat_names, cont_names=cont_names, procs=[Categorify, Normalize],
    y_names='target', block_y=CategoryBlock(), valid_idx=list( np.random.permutation(len(df))[:int(len(df)*.1)]), bs=64)

(Notice the block_y parameter here)

jerron · February 29, 2020, 5:39pm

Thank you for your explanation! Where can I find such information in the documentation?
My y’s are like following:

Clearly they are not categorical. What shall I feed to block_y?
Thanks again!

muellerzr · February 29, 2020, 5:42pm

It’s something I’ve picked up on since doing fastai with tabular a lot, but there’s hints that this is going on too. Feed CategoryBlock like I showed above, and this will override the default to make it into a classification problem.

hahmed988 · March 26, 2020, 4:04pm

Even for a regression problem I get below error:-

AssertionError: n_out is not defined, and could not be infered from data, set dls.c or pass n_out

Code Snippet

from fastai2.metrics import *
to = TabularPandas(df, procs, cat_names, dep_var, y_block = RegressionBlock(),
splits=splits)
dls = to.dataloaders(bs=64)
learn = tabular_learner(dls, layers=[10,10], metrics= [msle],
loss_func=MSELossFlat())

I expect integer values between 0-100. In my train dataset I have few of those values. Passing dls.c = 100 fixes the issue but I am not sure if it is the right approach.

hahmed988 · March 26, 2020, 4:09pm

Setting n_out = 1 in learner also fixes the issue. But when y_block is set to RegressionBlock. Why is this necessary?

muellerzr · March 26, 2020, 4:40pm

RegressionBlock only prepares outputs. It doesn’t grab how many outputs you have IIRC. I’ll try to look into it

Edit: you need to explicitly pass it in @hahmed988. IE RegressionBlock(c_out=1)

hahmed988 · March 29, 2020, 5:51am

I am somehow getting negative values on prediction even though I have passed y_range as [0-25]. Below is my code:-

to = TabularPandas(df, procs, cat_names, cont_names, dep_var, y_block=RegressionBlock(c_out=1), splits=splits)
dls = to.dataloaders(bs=32)
y_range = torch.tensor([0., 25.])

raw_test_preds = learn.get_preds(dl=dl)
raw_test_preds

(tensor([[-0.0510],
[ 0.1744],
[-0.1502],
…,
[ 0.1372],
[-0.0308],
[ 0.2815]]),
None)

What am I doing wrong !!

muellerzr · March 29, 2020, 5:55am

@hahmed988 I have a hunch. Can you try passing y_range into a tabular_config and do it that way? I don’t believe that TabularModel picks up the y_range

IE:

config = tabular_config(‘y_range’:tensor(0,25.))
learn = tabular_learner(…config = config)

hahmed988 · March 29, 2020, 6:17am

This works… getting positive values now ! Thanks

muellerzr · March 29, 2020, 6:22am

@sgugger FYI an adjustment should be made to tabular_learner probably getting rid of the y_range parameter and opting in for the tabular_config as I believe you’re intending for as it’s listed as a parameter but it’s never used

sgugger · March 29, 2020, 3:04pm

No y_range is always exposed (like in cnn_learner) because this is an arg you want to change very often. It’s a mistake that I forgot to pass it in the model call Will fix that now.

muellerzr · March 29, 2020, 3:05pm

Ahhh got it okay. Good to know the logic and thinking! Thanks!

sgugger · March 29, 2020, 3:10pm

Fixed and made it so that you can pass it both ways:

learn = tabular_learner(dls, y_range=(0,32))

or

learn = tabular_learner(dls, config = tabular_config(y_range=(0,32)))

Just don’t combine the two at once

muellerzr · March 29, 2020, 3:15pm

Thanks! So do we no longer have to explicitly wrap it in a tensor like in the Rossmann example? Is that being done on the back end then?

sgugger · March 29, 2020, 3:18pm

I don’t believe you aver had to do that. It adds a new layer SigmoidRange to your model.

nbobbele · January 18, 2021, 4:14pm

Hi, I am trying to do an image regression task and want to load data from the df so I have a df with one column being the paths to images and the other column the target values for each image. However, I have no idea how to specify the target column as such. I saw something like this somewhere else but the label_from_df attribute doesn’t exist anymore apparently. Does someone have a suggestion how to create a dataloader from such a df?
“”"
train_data = ImageDataLoaders.from_df(train, cols=[“file_path”]).label_from_df(label_cls=FloatList)
“”"