How to work with text in regression?

juanchoalric · May 14, 2021, 8:03pm

Hey! Finished Lesson 8 of the first part of fastai course. I was thinking of trying to tackle a problem in which I need to predict a number given a text.

So First I decided to create an LM suitable for this task

dls = TextDataLoaders.from_df(df, text_col='text', is_lm=True)

Then I decided to create the learner and start the training

learn = language_model_learner(dls, AWD_LSTM, drop_mult=0.3,metrics=[accuracy, Perplexity()]).to_fp16()

learn.fine_tune(1, 2e-2)

This gave me an accuracy of 30% on predicting the following word. Pretty good!

But now I’m quite stuck…

I started looking at the example given in the class:

class_data = DataBlock(blocks=(TextBlock.from_df('text', vocab=dls.vocab), CategoryBlock),get_x=ColReader('text'), get_y=ColReader('target'), splitter=RandomSplitter(0.2))

dls_class = class_data.dataloaders(df)

learn2 = text_classifier_learner(dls_class, AWD_LSTM, drop_mult=0.5, metrics=accuracy).to_fp16()

The problem is that this only works for classification, I need to modify it to work with regression.

Thanks in advance!

ilovescience · May 16, 2021, 6:36am

I’d look into changing the CategoryBlock → RegressionBlock

Conwyn · May 16, 2021, 3:01pm

Hi Juan
You could try multicategory so categories 0,1,2,3,4… where 0 = zero stars bad review and 4 stars = excellent review.
Regards Conwyn