Question on TextDataBunch creation?

Hey there !

I’m creating a model for text classification and my data is in a pandas dataframe. It seems that TextDataBunch.from_df requires explicitly setting the validation set, so I splitted my Dataframe into two.
Each dataframe has the labels in col 1.

So here’s my line:
data_clas = TextDataBunch.from.df(path, train_df = train_df, valid_df = valid_df, vocab = data_lm.vocab, text_cols = 0, label_cols = 1).databunch(bs = 32)

However, when running this, I get an error saying that my data isn’t labeled.

Could I get some help, please ? What am I doing wrong ?

Thanks a lot (:

I think you want a TextLMDataBunch.

Hey, thanks for answering ! (: Actually, I already fine-tuned the LM model. Now, I’d like to train the classifier. Do I still need TextLMDataBunch ?

It’s going to be TextClasDataBunch for this.

Yes, indeed. And it is quite logical in fact. Do I need to call databunch() at the end ?

Thanks a lot !

Oh I hadn’t seen that, no databunch at the end. This is for the data block API which is another method of creating DataBunch.

1 Like