Do you know if
Fast.ai can handle multi-label text data, as we did using CSV’s with ‘Planet: Understanding the Amazon from Space’ Kaggle competition.
In other words, what I have is multiple chat sessions with labels indicating the topics that were discussed there. I would like to arrange them in such way that
Fast.ai can ‘from_XXX’ get my data and attempt classification.
Let me know if anything is unclear.
I’m unsure why this hasn’t gotten any responses. I’d imagine someone used fast.ai to find a solution for the toxic comment Kaggle competition, so I’m going to look there now.
If all the labels are processed in the same way (e.g. if they are all labels of 1s and 0s) then you only need to create a single field.
I think it would be easier overall to just write the dataframes to disk as csv files and read them using the TabularDataset.
Do you happen to be working on the toxic comment classification competition for Kaggle? I’ve written a tutorial on using torchtext for text classification
here that uses the exact same dataset. I hope it can help!
Looks like this person did it.
Hello! I have the same question for multi-label text classification but I would like to apply
I replace in section
Classifier tokens from Lesson 10 the number of classes:
# tok_trn, trn_labels = get_all(df_trn, 1)
tok_val, val_labels = get_all(df_val, 166)
and in the section
c = int(trn_labels.shape)
I get an error:
RuntimeError: multi-target not supported at /opt/conda/conda-bld/pytorch_1512387374934/work/torch/lib/THCUNN/generic/ClassNLLCriterion.cu:16
Then I tried to replace the loss function by adding:
learn.crit = F.binary_cross_entropy_with_logits
and I get another error:
RuntimeError: Expected object of type Variable[torch.cuda.FloatTensor] but found type Variable[torch.cuda.LongTensor] for argument #1 'other'
did you solve your problem? I am also having that problem now.
no, I switched to other task
@Haotian - two more changes fix the errors:
convert the labels to floats to remove the last error:
trn_labels = np.squeeze(np.load(CLAS_PATH/‘tmp’/‘trn_labels.npy’)).astype(float)
val_labels = np.squeeze(np.load(CLAS_PATH/‘tmp’/‘val_labels.npy’)).astype(float)
change the accuracy metric from
accuracy_thresh to account for the different format of the labels:
learn.metrics = [accuracy_thresh(0.5)]
For further improvement the model created with
get_rnn_classifier can be adjusted or the loss function
I wonder if any of you successfully managed multi-label text classification with transfer learning on Fastai v1.
I am currently working on that, and I believe I am close to making it work now. I will be posting updates in
this other thread, in case you are still interested.
Yes, I am using multi-label text classification with transfer learning on Fastai v1
Is there any example notebook you could post?
I could prepare a representative notebook but now I am on holidays without access to my code. If it is not too late, I could post next week.
Thanks so much. This is very helpful.