Multilabel classifier worked example for text classification

sebastianzeki · April 15, 2021, 1:51pm

Hi

I am trying to use a multilabel NLP classifier. I cannot find many worked examples in fast.ai v2 so I thought posting my question might help others.

I have a dataset as follows

text                                                                                                            tag_list
I went to the shops and I couldnt find the dog I was looking for            G421 ; Z272 ; - ; -
I am really not a fan. He looked odd to me.                                           G421 ; Z271 ; Y23 ; -

The original dataset is much larger. Also there are many more tags. Not all of the sentences have the same number of labels as can be seen above.

I’d like to train a classifier on this dataset. Once the csv is imported I execute:

df_c_OPCS4=df_c_OPCS4[['text','tag_list']]

planet = DataBlock(blocks=(TextBlock.from_df(df_c_OPCS4), MultiCategoryBlock),
                   get_x=ColReader('text'),
                   splitter=RandomSplitter(),
                   get_y=ColReader('tag_list',label_delim=';'))

However, when creating the dataloaders
dls = planet.dataloaders(df_c_OPCS4,path='/content/gdrive/MyDrive/Colab_data')

I get the error:
ValueError: Index data must be 1-dimensional

What could I be doing wrong?