Expected output and target to have the same number of elements

I’m running tabular regression with following code:

dep_var='change'

df = pd.read_csv('/content/gdrive/My Drive/'+job_name+'.csv')
data = (tabular.TabularList.from_df(df,  path='.', cat_names=cat_names, cont_names=cont_names, procs=[Categorify, Normalize])
    .split_by_rand_pct(valid_pct = 0.2, seed = 8888)
    .label_from_df(cols=[dep_var])
    .databunch())   

learn = tabular_learner(data, layers=[5000,2000,2000,1000,500,500], metrics=exp_rmspe)
learn.fit_one_cycle(20, max_lr =1e-5,callbacks=[SaveModelCallback(learn,
	monitor='valid_loss',
	mode='min',
	name=modelfile)])

it runs fine.
But when I changed the first line to

dep_var='chgcat'

I got following error:

/usr/local/lib/python3.6/dist-packages/fastai/metrics.py in exp_rmspe(pred, targ) 69 “Exp RMSE between pred and targ.” —> 70 pred,targ = flatten_check(pred,targ) 71 pred, targ = torch.exp(pred), torch.exp(targ)

/usr/local/lib/python3.6/dist-packages/fastai/torch_core.py in flatten_check(out, targ) 378 out,targ = out.contiguous().view(-1),targ.contiguous().view(-1) --> 379 assert len(out) == len(targ), f"Expected output and target to have the same number of elements but got {len(out)} and {len(targ)}." 380 return out,targ

AssertionError: Expected output and target to have the same number of elements but got 448 and 64.

Here is what these two columns look like:
image
Why can’t I use different column as target? How to fix it?

2 Likes

Due to the 64 and 448 its feel like there is something going wrong with the batch size. Because 64 is a comen batch size. But not sure could you share the notebook in a gist or collab?

Can you see it here:


Thanks!

I of course don’t have the same dataset. Could you try running it like:

https://colab.research.google.com/drive/1yH-oU0iLASRlRIPgjOHt9T5Jo1CDTJFV#scrollTo=vqXj_PjMdXxT

with a random array just to see where stuff is going wrong

Thanks for taking look. I just sent the access request. Do you want to try my data as well? I can upload it somewhere if you like. Thanks again.

I saw what you did.
I uploaded my data to https://github.com/jerronl/colab/blob/master/ts.csv
can you try it?
Thanks!

I just tried to run the code I shared in google gist, it runs fine without the error at all! And I tried my code locally again, the error reappear. Not sure why but at least I can make it work.

Can I know the actual cause of this error?

I still don’t understand. Actually I met similar problem again. When I ran following code, the first learner succeeded to fit, while the 2nd one failed: The only difference is the first one is with FloatList while the second one is with CategoryList. I want Category because each raw is labeled by bucket 1-5.

print(df)

data = (tabular.TabularList.from_df(df, path='.', cat_names=cats, cont_names=cnts, procs=[Categorify, Normalize])

    .split_by_rand_pct(valid_pct = 0.3, seed = 8888)

    .label_from_df(cols=[dep_var],label_cls=FloatList)

    .databunch())   

learn = tabular_learner(data, layers=[1000,500,50], metrics=exp_rmspe)

#print(learn)

learn.fit_one_cycle(5, max_lr =1e-2)            

data2 = (tabular.TabularList.from_df(df, path='.', cat_names=cats, cont_names=cnts, procs=[Categorify, Normalize])

    .split_by_rand_pct(valid_pct = 0.3, seed = 8888)

    .label_from_df(cols=[dep_var],label_cls=CategoryList)

    .databunch())   

learn2 = tabular_learner(data2, layers=[1000,500,50], metrics=exp_rmspe)

#print(learn2)

learn2.fit_one_cycle(5, max_lr =1e-2)

Result:

oed0  oed1  wkd0  wkd1  month0  ...  IX2ln  IX2lnln    IX3  IX3ln  chgcls

0 1.0 0.0 12.0 89.0 11.0 … 14.25 15.32 14.85 13.48 2
1 0.0 -1.0 89.0 90.0 11.0 … 15.32 14.85 13.48 13.00 2
2 -1.0 -2.0 90.0 91.0 11.0 … 14.85 13.48 13.00 13.17 1
3 -2.0 -3.0 91.0 92.0 11.0 … 13.48 13.00 13.17 13.10 2
4 -3.0 -4.0 92.0 93.0 11.0 … 13.00 13.17 13.10 13.08 2
… … … … … … … … … … … …
95 -9.0 9.0 3.0 4.0 4.0 … 12.12 11.79 11.62 11.21 2
96 9.0 8.0 4.0 5.0 4.0 … 11.79 11.62 11.21 11.17 2
97 8.0 7.0 5.0 6.0 4.0 … 11.62 11.21 11.17 11.19 4
98 7.0 6.0 6.0 7.0 4.0 … 11.21 11.17 11.19 11.46 2
99 6.0 5.0 7.0 8.0 4.0 … 11.17 11.19 11.46 11.58 4

[100 rows x 20 columns]

epoch train_loss valid_loss exp_rmspe time
0 7.263242 7.262573 0.896337 00:00
1 7.079129 6.770688 0.885954 00:00
2 7.164012 6.559047 0.880290 00:00
3 7.008555 6.798931 0.885567 00:00
4 6.918363 6.944957 0.888992 00:00

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py:541: UserWarning: You are labelling your items with CategoryList.
Your valid set contained the following unknown labels, the corresponding items have been discarded.
5
if getattr(ds, ‘warn’, False): warn(ds.warn)

0.00% [0/5 00:00<00:00]
epoch train_loss valid_loss exp_rmspe time

0.00% [0/1 00:00<00:00]


AssertionError Traceback (most recent call last)

in ()
18 learn2 = tabular_learner(data2, layers=[1000,500,50], metrics=exp_rmspe)
19 #print(learn2)
—> 20 learn2.fit_one_cycle(5, max_lr =1e-2)

9 frames

/usr/local/lib/python3.6/dist-packages/fastai/torch_core.py in flatten_check(out, targ)
377 “Check that out and targ have the same number of elements and flatten them.”
378 out,targ = out.contiguous().view(-1),targ.contiguous().view(-1)
–> 379 assert len(out) == len(targ), f"Expected output and target to have the same number of elements but got {len(out)} and {len(targ)}."
380 return out,targ
381

AssertionError: Expected output and target to have the same number of elements but got 116 and 29.

This time I don’t have the luck as before: even though I ran it in the gist, the error persisted.

Hi mate,

From the table, I think that the type of chgcat variable is Int, which maybe interpreted as categorical target for Classification in tabular_learner.

For Regression problem (since you use metrics=exp_rmspe), you need to change dep_var variable (chgcat) to Float type, using the following.

df = df.astype({‘chgcat’:‘float64’})

After that, it should work fine.

You can also use df.dtypes to check the type of all variables.