neovaldivia
(Rafael Valdivia)
January 23, 2018, 9:07pm
1
Hey guys,
Do you know if Fast.ai can handle multi-label text data, as we did using CSV’s with ‘Planet: Understanding the Amazon from Space’ Kaggle competition.
In other words, what I have is multiple chat sessions with labels indicating the topics that were discussed there. I would like to arrange them in such way that Fast.ai can ‘from_XXX’ get my data and attempt classification.
Let me know if anything is unclear.
thank you.
2 Likes
yhadi
(Yeshar Hadi)
April 1, 2018, 4:52pm
2
I’m unsure why this hasn’t gotten any responses. I’d imagine someone used fast.ai to find a solution for the toxic comment Kaggle competition, so I’m going to look there now.
yhadi
(Yeshar Hadi)
April 1, 2018, 4:53pm
3
If all the labels are processed in the same way (e.g. if they are all labels of 1s and 0s) then you only need to create a single field.
I think it would be easier overall to just write the dataframes to disk as csv files and read them using the TabularDataset.
Do you happen to be working on the toxic comment classification competition for Kaggle? I’ve written a tutorial on using torchtext for text classification here that uses the exact same dataset. I hope it can help!
Looks like this person did it.
krasin
July 15, 2018, 5:52am
4
Hello! I have the same question for multi-label text classification but I would like to apply fastai.text
.
I replace in section Classifier tokens from Lesson 10 the number of classes:
# tok_trn, trn_labels = get_all(df_trn, 1)
tok_val, val_labels = get_all(df_val, 166)
and in the section Classifier
#c=int(trn_labels.max())+1
c = int(trn_labels.shape[1])
I get an error:
RuntimeError: multi-target not supported at /opt/conda/conda-bld/pytorch_1512387374934/work/torch/lib/THCUNN/generic/ClassNLLCriterion.cu:16
Then I tried to replace the loss function by adding:
learn.crit = F.binary_cross_entropy_with_logits
and I get another error:
RuntimeError: Expected object of type Variable[torch.cuda.FloatTensor] but found type Variable[torch.cuda.LongTensor] for argument #1 'other'
Any ideas?
1 Like
did you solve your problem? I am also having that problem now.
krasin
September 10, 2018, 8:21pm
6
no, I switched to other task
krasin
September 12, 2018, 6:28am
7
@Haotian - two more changes fix the errors:
convert the labels to floats to remove the last error:
trn_labels = np.squeeze(np.load(CLAS_PATH/‘tmp’/‘trn_labels.npy’)).astype(float)
val_labels = np.squeeze(np.load(CLAS_PATH/‘tmp’/‘val_labels.npy’)).astype(float)
change the accuracy metric from accuracy
to accuracy_thresh
to account for the different format of the labels:
learn.metrics = [accuracy_thresh(0.5)]
For further improvement the model created with get_rnn_classifier
can be adjusted or the loss function F.binary_cross_entropy_with_logits
changed.
Pablo
January 17, 2019, 4:04pm
8
Hi all!
I wonder if any of you successfully managed multi-label text classification with transfer learning on Fastai v1.
I am currently working on that, and I believe I am close to making it work now. I will be posting updates in this other thread , in case you are still interested.
krasin
January 22, 2019, 7:31am
9
Yes, I am using multi-label text classification with transfer learning on Fastai v1
1 Like
dwcar49us
(David Carroll)
February 20, 2019, 7:00am
10
Hi @krasin ,
Is there any example notebook you could post?
Thanks,
David
krasin
February 20, 2019, 4:33pm
11
I could prepare a representative notebook but now I am on holidays without access to my code. If it is not too late, I could post next week.
krasin
February 28, 2019, 10:52pm
12
2 Likes
dwcar49us
(David Carroll)
March 5, 2019, 7:18am
13
Thanks so much. This is very helpful.