Multi-label Text Classification Working Example

Hi,

Just wanted to share a working example of multi-label text classification that is working with Fast AI v1.
Did a quick search and I couldn’t see any clear examples of getting a multi-label classifier working.

This is useful when you have a passage of text/document that can have one of several labels or tags.

For example, a news article could have the tags world-news, political, election.

Your data should be in the form of a passage of text in one column and a string of labels with a separator in another column. For more details, check the dataset below.

After fine-tuning the language model on our documents we create a TextDataBunch

classifier_data = (TextList.from_df(df, path, cols=[‘words’,‘tag_list’], vocab=lm_data.vocab)
.split_by_rand_pct(0.2)
.label_from_df(cols=‘tag_list’, label_delim=’|’)
.databunch(bs=bs))

The key pieces to get the classifier working is providing the columns to import, the correct vocab from your fine-tuned language model, and labelling the data (tags) from a DataFrame column and passing the label delimiter.

Then you can just crate a text classifier as per usual.

learner = text_classifier_learner(classifer_data, AWD_LSTM, drop_mult=0.5, metrics=[fbeta])

Load your fine-tuned encoder.

learner.load_encoder(‘fine_tuned_enc’)
learner.freeze()

train the learner
lr = 1e-1
learner.fit_one_cycle(1,slice(lr/(2.6**4),lr), moms=(0.8,0.7) )

learner.freeze_to(-2)
learner.fit_one_cycle(1,slice(lr/(2.6**4),lr), moms=(0.8,0.7) )

Once you are finished fine-tuning the classifier you can check the predictions against a document from your validation sample.

article = classifer_data.valid_ds[0]

“(Text xxbos xxmaj my friend xxmaj kevin xxmaj taylor ( known to his family as xxmaj gordon )…

he had grown to love . xxmaj he is survived by his brothers , xxmaj ben and xxmaj alf , and three nephews . xxunk - news,
MultiCategory world;us-news)”

pred = learner.predict(article); pred

(MultiCategory us-news;world,
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1.]),

Have poped a rough working example notebook on github.

Workbook:

Dataset:

6 Likes

Hello there,

First a fall,thank you for this post. I am trying to do master’s thesis on multi-label text classification myself. This post helped me a lot. My question is regarding your dataset. How did you come across this dataset? Did you created it by yourself? Has this dataset been used in some academic journals? The reason why I am asking these questions are because I am trying to improve multi label classification of news articles. So if some papers have used your data, then it would be really easy to compare the results.

Thank you

Hi Pratik,

I’m really glad that my post helped.
Thanks for the questions about the dataset I’ve used here.
This is just a data set that I created to help me get an understanding of multi-label classification.
More specifically, using the free API provided by The Guardian news paper downloaded a few years worth of articles that are matched with their tags (see https://open-platform.theguardian.com/documentation for more details on the API).

The article tags have two levels of aggregation.
For example, world-news/american-news

For this dataset I aggregate the articles into their most general tag groups.

If you were looking to train a more fine grained multi label classifier you could aggregate at the most specific tag level which has upwards of 10K categories.

Just let me know if there is anything I can help explain further or if you would like the python code I used to pull the articles from the API.

Kind Regards
Cam

1 Like

Hi Cam,

Thank you for your reply. I did not understand when you said the article tags have two levels of aggregation. What does it mean? Does it mean that the articles have a hierarchy of two levels? So first would be a certain category news and next will be whether it’s american news or bbc news?

Also I was thinking of applying hierarchical attention network for multi label classification. Have you looked at that approach?

Finally, what do you think about the yahoo multilabel news corpus? I have submitted a request but still the request is pending. Do you think it would be a good idea to experiment on it? Have you had any chance to play with it?

I am sorry I am asking many questions. As I said earlier, I am trying to do multi label classification so any help would be really really appreciated.

Thank you,
Pratik

Hi Pratik,

You understanding of what I meant by tow level of aggregation is correct.

Each tag consists of a general tag (there are about 80 in this dataset) and a more specific tag (10k-20k).

The Yahoo multilabel news corpus looks really good from reading the site. It appears to be similar our dataset.

The challenging part of our data set is that it pulled from a single news source (The Guardian) based in the UK.

The Yahoo dataset might have a more broad range of topics that are tagged from a more global perspective.

I’ve played around with using the Transformer XL architecture and BERT architectures with the news dataset.

Both worked well, in saying that, the AWD-LSTM pre trained on Wiki-text preformed well too.

I didn’t complete a comparative study, but, it would be really interesting to see you’re results once you have completed your analysis.

Kind Regards
Cam

Hi Cam,
Here is the yahoo dataset…

https://drive.google.com/open?id=1CixmdtBaPAZ-HO_ZAmDmM0EPqRx-HfX5

Hi Pratik,

Thank you so much for uploading the Yahoo Dataset and for the thoughtful question about hierarchies in the dataset from the example.

Not sure whether the dataset that I uploaded earlier had those hierarchies included so have reuploaded a version that does to:

https://drive.google.com/open?id=1p6KtKoVhmNnq0fOLf0wvF0fnW6LHCL7b

In this new data set, the tags maintain their two level hierarchies.

The taglist is saved in the tags column.

For example,

us-news/tamir-rice| … |world/world

The first part of the tag is the more generic tag category and the second part is the more specific tag.

Please let me know if there is anything that could benefit from further explanation.

Kind Regards
Cameron

Hey Cam,
I have put some queries in your inbox. Could you please respond to them? …Thanks