NLP transfer learning multi label

Pablo · January 23, 2019, 3:48pm

This is more or less an skeleton of working code. The first part you may take it from the Jypiter journal.

Of course, any insight if I am not doing something optimally is most welcome!

path = Path(*your path to the folder with your unsupervised documents*) 
model_url = *url to your pre-trained language model, e.g., from fastai in English: URLs.WT103_1*

Then try to load your data for the language model:

    try:
        data = TextLMDataBunch.load(path, 'tmp_lm', bs=batch_size)
    except FileNotFoundError:
        print('Data bunch not found, creating one from data source...')
        data = (TextList.from_folder(path)
                     .filter_by_folder(include=())
                     .random_split_by_pct(0.1)
                     .label_for_lm()
                     .databunch(bs=batch_size))
        data.save('tmp_lm')

Now you instantiate a language model learner:

learner = language_model_learner(data, pretrained_model=model_url, drop_mult=0.3)

Learning rate finder:

learner.lr_find()    
learner.recorder.plot(skip_end=15)

Then you train the last layers of the language model:

    try:
        learner.load('fit_head')
    except FileNotFoundError:
        print('\nTraining language model (last layers)...')
        learner.fit_one_cycle(1, 5e-2, moms=(0.8, 0.7))
        learner.save('fit_head')

Then you train the whole thing. In the Jupyter example they use 10 cycles here, but in my case apparently one is better? I’m still figuring out these details.

    try:
        learner.load('fine_tuned')
    except FileNotFoundError:
        print('\nFine-tuning learner...')
        learner.unfreeze()
        learner.fit_one_cycle(number_rounds, 5e-3, moms=(0.8, 0.7))
        learner.save('fine_tuned')

To test the language model (which is fun):

    text_prompt = 'I wonder what text comes after this"
    n_words = 100
    n_sentences = 2
    print("\n".join(learner.predict(text_prompt, n_words, temperature=0.75)
                    for _ in range(n_sentences)))

Save the language model encoder (the part that the classifier will use):

learner.save_encoder('fine_tuned_enc')

Next step: we need a classifier. IMPORTANT: we need the vocabulary from the language model!

vocab = data.vocab

Loading its dataset.

    try:
        classifier_data = TextDataBunch.load(path,
                                             'tmp_multi_label_data',
                                             bs=batch_size)
    except FileNotFoundError, IndexError:
        print('Some error message')
        label_cols = [0, 1, 2, 3]  # the columns from which you take the labels in the csv file
        classifier_data = (TextList.from_csv(path,
                                             relative_path_to_csv_file_from_path,
                                             cols='text',
                                             vocab=vocab)
            .random_split_by_pct(valid_pct=0.2)
            .label_from_df(cols=label_cols)
            .databunch(bs=batch_size))
        classifier_data.save('tmp_multi_label_data')

Then you create the classifier learner:

classifier_learner = text_classifier_learner(classifier_data,
                                             drop_mult=0.5,
                                             metrics=[fbeta])
classifier_learner.load_encoder('fine_tuned_enc')

Finally, just train the learner. I have not been completely successful here yet, although it trains and learns and classifies. Just not as good as another classifier I have…

classifier_learner.freeze()
classifier_learner.lr_find()    
classifier_learner.recorder.plot(skip_end=15)

classifier_learner.fit_one_cycle(1, 1e-1, moms=(0.8, 0.7))
classifier_learner.save('first_cycle')

classifier_learner.fit_one_cycle(1, 5e-2, moms=(0.8, 0.7))
classifier_learner.save('second_cycle')

classifier_learner.freeze_to(-2)
classifier_learner.fit_one_cycle(1, slice(1e-2 / (2.6 ** 4), 1e-2), moms=(0.8, 0.7))
classifier_learner.save('third_cycle')

classifier_learner.freeze_to(-3)
classifier_learner.fit_one_cycle(1, slice(5e-3 / (2.6 ** 4), 5e-3), moms=(0.8, 0.7))
classifier_learner.save('fourth_cycle')

classifier_learner.unfreeze()
classifier_learner.fit_one_cycle(1, slice(1e-3 / (2.6 ** 4), 1e-3), moms=(0.8, 0.7))
classifier_learner.save('fifth_cycle')

That’s it! Now you may load a saved classifier:

classifier_learner.load('third_cycle')

And classify stuff:

prediction = classifier_learner.predict(string)

This is not exactly a working example, but it comes close. I hope it helps!

Isbister · February 28, 2019, 2:05pm

Great work. I have not tried this yet, but I was wondreing, would this work for multi-label regression? I want to build one model that has 5 labels, the labels range from 1.0 to 7.0, and it should be able to output several labels such as: out > [6.2, 0.1, 0.3, 0.6, 1.1]

Pablo · February 28, 2019, 3:04pm

I don’t see why it shouldn’t work! Make sure to update your loss function though (learner.loss_func). Let us know how it goes!

Question, @sgugger: the loss function is currently inferred by the data type. In my case I have several label columns, so it correctly selects BCE. Does this change if the targets are floats? Would this be desirable?

sgugger · February 28, 2019, 3:20pm

You targets re floats behind the scenes otherwise BCE wouldn’t be happy
When you want to do a regression with multiple columns, you have to pass label_cls to override the default, and it will then infer a correct default loss function (they are given by the targets).

danaludwig · March 8, 2019, 12:58am

When I did 10 cycles, the validation loss got worse for 2 cycles and then got better. I’m guessing that maybe this was related to the triangle learning rate algo in “fit_one_cycle”?

ioko · March 31, 2019, 8:19pm

Hi! I am using a similar approach to classify 4 variables (each with 3 levels[1,2,3])

E.g. the variable weather should only appear as clear or cloudy, but not both at the same time.
As I understood, in case the minimum treshold is reached for both, this might happen.

Is there a way to ensure that each variable (having multipe levels) is classified exactly once?

*edit:
The same problem might occure if the treshhold for one variable is not reached, then the model might predict only 3 instead of 4 variables. Can´t I always choose the level that the model has the highest confidence in?

Pablo · March 31, 2019, 9:42pm

Why don’t you try using a custom loss function so that you can have proper multi-class prediction for each label independently?

Imagine you have label A with three classes (A1, A2, A3) and then labels B and C (which may be or not present, multi-label style).

Your model would need 5 outputs: 3 would go for A (like in a common classifier) and then one output for B and another for C. Your loss would take the first three for a multi-class-like loss (including a softmax function) and then the last two outputs for a multi-label-like part of the loss (including a sigmoid function).

Does this sound reasonable?

ioko · April 1, 2019, 1:15pm

Yes, unfortunately thats exactily what I am struggeling with.
As I understood the loss function is inferred from my data. Without further specification I obtain my loss function: FlattenedLoss of BCEWithLogitsLoss()
I agree, a argmax function for each label (with multiple classes) would to be the best fit.

Unfortunately my model still predicts multiple classes for one label. Is there another loss function I need to specify in for such cases?

**talking about the multi-class-like case: category A with labels(A1,A2,A3), category B labels (B1,B2,B3), …

Pablo · April 1, 2019, 1:47pm

In your case I would try writing my own custom loss function. Fortunately it is super easy to override the default loss. Try:

learner.loss_func = MyLoss

Your loss function should be very easy, you just need to separate the elements from your predictions (and labels) and pass them to a normal loss function.

Does this make sense?

ioko · April 1, 2019, 5:50pm

Yes, I will try to change the loss function.

From what I understood, the function needs to look at on category at a time and apply argmax to the 3 corresponding output neurons. If this process is applied for all 4 categories I should receive an output like e.g. (A1,B1,C2,D1)

So I think I can find out the order in which my output neuron predict the different labels. But how can I select several neurons from my output layer that I want to perform the argmax function with?

Pablo · April 2, 2019, 9:56am

If I’m not mistaken you can simply slice the output tensor like you would a Numpy array. If your batch is the first dimension, for example, you could do:

preds_for_a = preds[:, :3]

Then you simply need to understand the order of your labels, since your outputs will mean what you decide they mean (and pass to your loss and appropriate sigmoid/softmax activations).

ioko · April 2, 2019, 9:03pm

Thanks for your reply. Makes sense. It try to model the loss function like this

def myloss()
mse_loss = nn.CrossEntropyLoss()
A = mse_loss(inpA, target)
B = mse_loss(inpB, target)
C = mse_loss(inpC, target)
D = mse_loss(inpD, target)
loss = A + B + C + D
loss.backward()

with the inputs from my output layer

inpA = preds[:, :3]
inpB = preds[:, 3:6]
…

I need to slice the target tensor accordingly I assume. Is there an object storing this information already?

I think it makes sense to create 4 different metrics for accuracy, one for each category correct?

Pablo · April 2, 2019, 9:19pm

This sounds about right! About the object storing the target information, that probably depends on how you created your batch.

One problem you may find is that multi-class often expects an output for each class (what goes into the softmax, although that actually happens within the loss function if you use the right one) but a class ID as target, not a vector! Like this:

Out: [0.1, 0.0, 0.9]
Target: 2

So in your case you would need your targets to be a concatenation of such indices. I don’t know the best way to do this in Fastai.

Having an accuracy for each category sounds good!

ck37 · May 1, 2019, 5:53am

I’m trying to do something along these lines and have reviewed a bunch of forum posts & the docs, but am struggling.

I have 9 separate outcome columns which are categorical variables (they are all survey questions). The first outcome has two options (0 or 1) and the remaining 8 outcomes have 6 options (-1 through 4). Am I correct that these should be the MultiCategoryList class type? And for label_cls does setting it to a class type automatically apply to all of my outcome columns, or do I need to repeat it for each outcome column?

I’m also wondering - is it not possible to apply negative log-likelihood loss to each outcome and then sum the loss across all 9 outcomes, without needing to implement a custom loss function as proposed above? My sense/hope is that fast.ai would essentially do that if I specified my label_cls correctly, although I may also need to change my Series in the source pandas dataframe?

Here is my current code:

data_clas = (TextList.from_df(df, data_path,
                              cols = text_col,
                              vocab = data_lm.vocab)
                 .random_split_by_pct(valid_pct = 0.2, seed = 1)
                 .label_from_df(cols = outcomes,
                                label_cls = MultiCategoryList)
                 .databunch(bs = bs))

# This should be BCEWithLogitsFlat based on
# https://forums.fast.ai/t/nlp-transfer-learning-multi-label/35007
# However based on the documentation this seems to be for binary outcomes, not categorical.
print(data_clas.loss_func)

# Number of outputs/classes in the final layer of the model. Returns 9.
print(data_clas.c)

Given that it’s choosing BCEWithLogsFlat as the loss_func it seems that I am not specifying the label_cls correctly, or something else is going wrong?

sgugger · May 1, 2019, 6:33pm

A MultiCategoryList is when you have mutli-label data: that means that a sample could have several tags to predict. This isn’t what you want here, and you will probably need to write a custom ItemList to label your targets as well as a custom head for your network and then write your custom loss function if you want to apply cross entropy for each outcome.

ck37 · June 7, 2019, 5:57pm

To me it seems that MultiCategoryList is a misnomer. A CategoryList is a single categorical outcome, so “MultiCategory” implies multiple categorical outcomes.

Multi-label data, i.e. multiple binary outcomes, would be more appropriately named MultiLabelList, MultiClassificationList, or MultiBinaryList - this last one would be more consistent with CategoryList & FloatList. Any of these would be consistent with the docstring: “Basic ItemList for multi-classification labels.”

Otherwise, for multiple categorical outcomes, what would be the appropriate class name? MultiCategoryList is the only clear name that comes to mind. I ask because I would like to submit this as a PR when it’s ready.

nwzjk · June 24, 2019, 6:47am

I am working on mulit-labels text classification problem.
I want to get an official(or real working) copy code example on mulit-labels classification using text_classifier_learner (for the newest fastai version). would you kindly post the URL?

Pablo · June 24, 2019, 7:59am

Hi @nwzjk

Try following this post in the thread.

Note that you can simply follow the Jupyter example for mul-class text classification. The only thing you need to change is the data loader you use (the other changes happen automatically behind the scenes: adapting the number of outputs and changing the loss function), and you have the new one in the post I linked.

Let us know if you need any help!

Anish_sri · October 28, 2020, 4:57pm

@Pablo Hello Pablo,
I am looking to do a multi label text classification . can you help me out where i have done wrong in the link mentioned below.Multi label text classification .Hope you will help me out to solve the problem…?
thanks,
Anish

Pablo · November 6, 2020, 11:03am

I’m sorry Anish, I’m a bit swamped at the moment with work and children