Multi-label text classification

neovaldivia · January 23, 2018, 9:07pm

Hey guys,

Do you know if Fast.ai can handle multi-label text data, as we did using CSV’s with ‘Planet: Understanding the Amazon from Space’ Kaggle competition.

In other words, what I have is multiple chat sessions with labels indicating the topics that were discussed there. I would like to arrange them in such way that Fast.ai can ‘from_XXX’ get my data and attempt classification.

Let me know if anything is unclear.

thank you.

yhadi · April 1, 2018, 4:52pm

I’m unsure why this hasn’t gotten any responses. I’d imagine someone used fast.ai to find a solution for the toxic comment Kaggle competition, so I’m going to look there now.

yhadi · April 1, 2018, 4:53pm

Looks like this person did it.

krasin · July 15, 2018, 5:52am

Hello! I have the same question for multi-label text classification but I would like to apply fastai.text.

I replace in section Classifier tokens from Lesson 10 the number of classes:

# tok_trn, trn_labels = get_all(df_trn, 1)
tok_val, val_labels = get_all(df_val, 166)

and in the section Classifier

#c=int(trn_labels.max())+1
c = int(trn_labels.shape[1])

I get an error:

RuntimeError: multi-target not supported at /opt/conda/conda-bld/pytorch_1512387374934/work/torch/lib/THCUNN/generic/ClassNLLCriterion.cu:16

Then I tried to replace the loss function by adding:

learn.crit = F.binary_cross_entropy_with_logits

and I get another error:

RuntimeError: Expected object of type Variable[torch.cuda.FloatTensor] but found type Variable[torch.cuda.LongTensor] for argument #1 'other'

Any ideas?

Haotian · August 29, 2018, 8:17pm

did you solve your problem? I am also having that problem now.

krasin · September 10, 2018, 8:21pm

no, I switched to other task

krasin · September 12, 2018, 6:28am

@Haotian - two more changes fix the errors:

convert the labels to floats to remove the last error:

trn_labels = np.squeeze(np.load(CLAS_PATH/‘tmp’/‘trn_labels.npy’)).astype(float)
val_labels = np.squeeze(np.load(CLAS_PATH/‘tmp’/‘val_labels.npy’)).astype(float)
change the accuracy metric from accuracy to accuracy_thresh to account for the different format of the labels:

learn.metrics = [accuracy_thresh(0.5)]

For further improvement the model created with get_rnn_classifier can be adjusted or the loss function F.binary_cross_entropy_with_logits changed.

Pablo · January 17, 2019, 4:04pm

Hi all!

I wonder if any of you successfully managed multi-label text classification with transfer learning on Fastai v1.

I am currently working on that, and I believe I am close to making it work now. I will be posting updates in this other thread, in case you are still interested.

krasin · January 22, 2019, 7:31am

Yes, I am using multi-label text classification with transfer learning on Fastai v1

dwcar49us · February 20, 2019, 7:00am

Hi @krasin,

Is there any example notebook you could post?

Thanks,

David

krasin · February 20, 2019, 4:33pm

I could prepare a representative notebook but now I am on holidays without access to my code. If it is not too late, I could post next week.

krasin · February 28, 2019, 10:52pm

Hi @dwcar49us,

Here is a link to example multi-label text classification notebook.
An example dataset is also uploaded. Hope it works

Krasin

dwcar49us · March 5, 2019, 7:18am

Thanks so much. This is very helpful.