I am trying to use a multilabel NLP classifier. I cannot find many worked examples in fast.ai v2 so I thought posting my question might help others.
I have a dataset as follows
text tag_list I went to the shops and I couldnt find the dog I was looking for G421 ; Z272 ; - ; - I am really not a fan. He looked odd to me. G421 ; Z271 ; Y23 ; -
The original dataset is much larger. Also there are many more tags. Not all of the sentences have the same number of labels as can be seen above.
I’d like to train a classifier on this dataset. Once the csv is imported I execute:
planet = DataBlock(blocks=(TextBlock.from_df(df_c_OPCS4), MultiCategoryBlock), get_x=ColReader('text'), splitter=RandomSplitter(), get_y=ColReader('tag_list',label_delim=';'))
However, when creating the dataloaders
dls = planet.dataloaders(df_c_OPCS4,path='/content/gdrive/MyDrive/Colab_data')
I get the error:
ValueError: Index data must be 1-dimensional
What could I be doing wrong?