Predicting Regression for language model

Hi,
As part of Lesson4 practice, I’m playing with the “How good is your Medium article” kaggle competition which require predicting the article “claps”.
I trained the LM model and the classifier model on the training set, and I would like now to get the predictions for the test set.
I tried to create a TextClasDataBunch that includes the test set in order to be able to use the learn.get_preds(DatasetType.Test), but I keep getting errors that I can’t debug.

When I tried to use -

data_clas2 = TextClasDataBunch.from_csv(path,'train.csv', valid_pct=0.2,test='test.csv',vocab=data_lm.vocab,text_cols='content',label_cols='target',bs = 32)

I got an error -__init__() got an unexpected keyword argument 'classes'. I guess it is because it expect to get classes for the target, but I’m doing a regression and I don’t get a class.

When I tried to use -

data_clas = (TextList.from_csv(path,‘train.csv’,cols=‘content’, vocab=data_lm.vocab)
.random_split_by_pct(valid_pct=0.2)
.add_test(TextList.from_csv(path,‘test.csv’,cols=‘content’, vocab=data_lm.vocab))
.label_from_df(cols=‘target’,label_cls=FloatList)
.databunch(bs=bs))

I got an error -

‘TextList’ object has no attribute ‘add_test’.

I saw from previous post (like this) that the TestList used to have .add_test in the past, but I guess it was removed.

When I’m trying to create the same data_clas without the .add_test everything is working.

I thought maybe to add fake target data for my test set, and then create dataBunch of this set just for the prediction, but it seems as a less elegant solution…

Any help or a reference to a notebook that got this working will be great.

Edit : now I see there was already similar previous thread that also pointed that currently there is no stright way to do regression for LM.
Thanks,
Ran

1 Like

@Vertigo42 I managed to get it working. Try moving the add_test part to after the .label_from_df and remove the vocab field within the add_test TextList object.

e.g.

data_clas = (TextList.from_csv(path,‘train.csv’,cols=‘content’, vocab=data_lm.vocab)
.random_split_by_pct(valid_pct=0.2)
.label_from_df(cols=‘target’,label_cls=FloatList)
.add_test(TextList.from_csv(path,‘test.csv’,cols=‘content’))
.databunch(bs=bs))

Happy to provide any further assistance if you need.

2 Likes

Great, Thanks!

I would never have guessed that the order of the blocks is critical.
I’ll give it a try.