Can't create tabular_learner

When I run following code, I got error

from fastai2.tabular.all import *
df = pd.read_csv('/content/gdrive/My Drive/myinput.csv')
np.random.seed(88)
dls = TabularDataLoaders.from_df(df, path='.', 
    cat_names=cat_names, cont_names=cont_names, procs=[Categorify, Normalize],
    y_names='target', valid_idx=list( np.random.permutation(len(df))[:int(len(df)*.1)]), bs=64)
learn = tabular_learner(dls, layers=[2000,2000,500,200,50], metrics=accuracy)

error message:


AssertionError Traceback (most recent call last)

  8     cat_names=cat_names, cont_names=cont_names, procs=[Categorify, Normalize],
  9     y_names='prempd', valid_idx=list( np.random.permutation(len(df))[:int(len(df)*.1)]), bs=64)

—> 10 learn = tabular_learner(dls, layers=[2000,2000,500,200,50], metrics=accuracy)
11 learn.fit_one_cycle(3)

/usr/local/lib/python3.6/dist-packages/fastai2/tabular/learner.py in tabular_learner(dls, layers, emb_szs, config, n_out, y_range, ps, embed_p, use_bn, bn_final, **kwargs)
31 emb_szs = get_emb_sz(dls.train_ds, {} if emb_szs is None else emb_szs)
32 if n_out is None: n_out = get_c(dls)
—> 33 assert n_out, "n_out is not defined, and could not be infered from data, set dls.c or pass n_out"
34 model = TabularModel(emb_szs, len(dls.cont_names), n_out, layers, y_range=y_range,
35 ps=ps, embed_p=embed_p, use_bn=use_bn, bn_final=bn_final, **config)

AssertionError: n_out is not defined, and could not be infered from data, set dls.c or pass n_out

how to fix?

As the error meassage tells you: either set dls.c or pass n_out to indicate how many classes you have.

This is the sign that the fastai library did not understand your data. You might want to pass a block_y in your call to DataLoaders to indicate what type of targets you have.

Thank you. What exactly shall I do? What does “classes” mean here, or what is dls.c?

Classes is the cardinality of the number of outputs your expecting (0,1 if binary, or labeled classes)

What may be happening are your y’s are already encoded (0,1, etc) and so fastai is picking it up as regression. So what the second part means,

Means you should do:

TabularDataLoaders.from_df(df, path='.', 
    cat_names=cat_names, cont_names=cont_names, procs=[Categorify, Normalize],
    y_names='target', block_y=CategoryBlock(), valid_idx=list( np.random.permutation(len(df))[:int(len(df)*.1)]), bs=64)

(Notice the block_y parameter here)

Thank you for your explanation! Where can I find such information in the documentation?
My y’s are like following:
image
Clearly they are not categorical. What shall I feed to block_y?
Thanks again!

It’s something I’ve picked up on since doing fastai with tabular a lot, but there’s hints that this is going on too. Feed CategoryBlock like I showed above, and this will override the default to make it into a classification problem.

Even for a regression problem I get below error:-

AssertionError: n_out is not defined, and could not be infered from data, set dls.c or pass n_out

Code Snippet

from fastai2.metrics import *
to = TabularPandas(df, procs, cat_names, dep_var, y_block = RegressionBlock(),
splits=splits)
dls = to.dataloaders(bs=64)
learn = tabular_learner(dls, layers=[10,10], metrics= [msle],
loss_func=MSELossFlat())

I expect integer values between 0-100. In my train dataset I have few of those values. Passing dls.c = 100 fixes the issue but I am not sure if it is the right approach.

Setting n_out = 1 in learner also fixes the issue. But when y_block is set to RegressionBlock. Why is this necessary?

RegressionBlock only prepares outputs. It doesn’t grab how many outputs you have IIRC. I’ll try to look into it

Edit: you need to explicitly pass it in @hahmed988. IE RegressionBlock(c_out=1)

I am somehow getting negative values on prediction even though I have passed y_range as [0-25]. Below is my code:-

to = TabularPandas(df, procs, cat_names, cont_names, dep_var, y_block=RegressionBlock(c_out=1), splits=splits)
dls = to.dataloaders(bs=32)
y_range = torch.tensor([0., 25.])

raw_test_preds = learn.get_preds(dl=dl)
raw_test_preds

(tensor([[-0.0510],
[ 0.1744],
[-0.1502],
…,
[ 0.1372],
[-0.0308],
[ 0.2815]]),
None)

What am I doing wrong !!

@hahmed988 I have a hunch. Can you try passing y_range into a tabular_config and do it that way? I don’t believe that TabularModel picks up the y_range

IE:

config = tabular_config(‘y_range’:tensor(0,25.))
learn = tabular_learner(…config = config)

This works… getting positive values now ! Thanks

@sgugger FYI an adjustment should be made to tabular_learner probably getting rid of the y_range parameter and opting in for the tabular_config as I believe you’re intending for as it’s listed as a parameter but it’s never used

No y_range is always exposed (like in cnn_learner) because this is an arg you want to change very often. It’s a mistake that I forgot to pass it in the model call :slight_smile: Will fix that now.

1 Like

Ahhh got it okay. Good to know the logic and thinking! Thanks!

Fixed and made it so that you can pass it both ways:

learn = tabular_learner(dls, y_range=(0,32))

or

learn = tabular_learner(dls, config = tabular_config(y_range=(0,32)))

Just don’t combine the two at once :wink:

2 Likes

Thanks! So do we no longer have to explicitly wrap it in a tensor like in the Rossmann example? Is that being done on the back end then?

I don’t believe you aver had to do that. It adds a new layer SigmoidRange to your model.

1 Like

Hi, I am trying to do an image regression task and want to load data from the df so I have a df with one column being the paths to images and the other column the target values for each image. However, I have no idea how to specify the target column as such. I saw something like this somewhere else but the label_from_df attribute doesn’t exist anymore apparently. Does someone have a suggestion how to create a dataloader from such a df?
“”"
train_data = ImageDataLoaders.from_df(train, cols=[“file_path”]).label_from_df(label_cls=FloatList)
“”"