Multilabel Classification with ULMFiT

Thank you!

Hi,
I wonder if someone has a complete notebook to share for Multi-label classification using fastai.text. I will really appreciate as I can’t figure out how to do this. I fix one thing and then get stuck in another thing. This will be very useful for newbies. Thanks

2 Likes

Hi Everyone,

I am facing out of memory issue in Multi-label Classification using fastai library. I think I am making some mistakes that cause this error because this error arises even if I reduce the batch size to a very low 2. I am running this on AWS Instance that has a GPU with 12GB memory. The data set that I am using is not big either. I have uploaded the data and notebook to dropbox, which can be downloaded from this link. I will be very thankful if someone can help and comment what are the mistakes. This will be super useful for many users that are using wonderful fastai library.

Thanks

## Vaccines Classification

import warnings
warnings.filterwarnings("ignore")
from fastai.text import *
import html

BOS = 'xbos'  # beginning-of-sentence tag
FLD = 'xfld'  # data field tag

PATH=Path('data/vaccines/')

torch.cuda.is_available()

CLAS_PATH=Path(PATH/'vaccines_clas')
#CLAS_PATH.mkdir(exist_ok=True)

LM_PATH=Path(PATH/'vaccines_lm')

## Classifier

trn_clas = np.load(CLAS_PATH/'tmp'/'trn_ids.npy')
val_clas = np.load(CLAS_PATH/'tmp'/'val_ids.npy')
itos = pickle.load(open(CLAS_PATH/'tmp'/'itos.pkl', 'rb'))

#These are single-label data, with the labels in 0s and 1s form.
trn_labels = np.squeeze(np.load(CLAS_PATH/'tmp'/'trn_labels.npy'))
val_labels = np.squeeze(np.load(CLAS_PATH/'tmp'/'val_labels.npy'))


#Creating multi-labels just for testing purpose
trn_labels1 = np.array([trn_labels, trn_labels,trn_labels,trn_labels,trn_labels,trn_labels,trn_labels])
trn_labels1 = trn_labels1.transpose()
val_labels1 = np.array([val_labels, val_labels,val_labels,val_labels,val_labels,val_labels,val_labels])
val_labels1 = val_labels1.transpose()

val_labels1.shape

em_sz,nh,nl = 400,1150,3

wd = 1e-7
bptt = 30
bs = 24
opt_fn = partial(optim.Adam, betas=(0.8, 0.99))
vs = 60004#len(itos) 60004

min_lbl = trn_labels1.min()
trn_labels1 -= min_lbl
val_labels1 -= min_lbl
c=7#int(trn_labels.max())+1

trn_ds = TextDataset(trn_clas, trn_labels1)
val_ds = TextDataset(val_clas, val_labels1)
trn_samp = SortishSampler(trn_clas, key=lambda x: len(trn_clas[x]), bs=bs//2)
val_samp = SortSampler(val_clas, key=lambda x: len(val_clas[x]))
trn_dl = DataLoader(trn_ds, bs//2, transpose=True, num_workers=3, pad_idx=1, sampler=trn_samp)
val_dl = DataLoader(val_ds, bs, transpose=True, num_workers=3, pad_idx=1, sampler=val_samp)
md = ModelData(PATH, trn_dl, val_dl)

dps = np.array([0.4,0.5,0.05,0.3,0.4])*0.5

m = get_rnn_classifer(bptt, 20*70, c, vs, emb_sz=em_sz, n_hid=nh, n_layers=nl, pad_token=1,
      layers=[em_sz*3, 50, c], drops=[dps[4], 0.1],
      dropouti=dps[0], wdrop=dps[1], dropoute=dps[2], dropouth=dps[3])

opt_fn = partial(optim.Adam, betas=(0.7, 0.99))

learn = RNN_Learner(md, TextModel(to_gpu(m)), opt_fn=opt_fn)
learn.reg_fn = partial(seq2seq_reg, alpha=2, beta=1)
learn.clip=25.
learn.metrics = [accuracy_thresh(0.5)]

lr=3e-3
lrm = 2.6
lrs = np.array([lr/(lrm**4), lr/(lrm**3), lr/(lrm**2), lr/lrm, lr])

#lrs=np.array([1e-4,1e-4,1e-4,1e-3,1e-2])

len(itos)

# This is language model previously trained on data
learn.load_encoder('V_lm_enc_last_ft')
wd = 1e-7

#learn.freeze_to(-1)

#Memory error even with very small batch size (bs) and bptt
learn.fit(lrs/10, 1, wds=wd, cycle_len=1, use_clr=(8,3))

I’m unsure how the training data should be stored. Typically, we create a “train” folder, and for each class, we create a folder with its name (e.g. “positive” or “negative”), and that folder is filled with examples of that class. We could continue that structure here, but then you would only ask the model to predict on one label at a time, and it’d never learn to output multiple labels. Even if it tried to output multiple labels, it’d be penalized because the structure is such that it only ever sees one label at a time.

How do we translate this structure of data to the multilabel scenario?

Multi-label classification is different from traditional way of keeping label data in corresponding label folders. It is discussed in fastai lesson 3 for images but I am trying to do it for text. Other people probably have done this but I have problem. Hopefully someone may help.

As I asked the question, I realized that was probably the case. I’ll go ahead and rewatch the video (it’s been awhile) and update you if I have success. :slight_smile:

1 Like

The only difference is that instead of a single label column, you have multiple label columns (where the value is 0 or 1). Other than that, everything is the same.

1 Like

I understand that, but my question was specifically directed at how we access our stored training data, not how do we structure it once we have it loaded. The storage structure in the ULMFit lecture doesn’t lend itself to multilabel classification, but @zubair1.shah reminded me that previous lectures might help with finding the appropriate structure. :slight_smile:

It looks like the major difference in data processing comes in at the get_texts method. It assumes that each text only has one associated label, so we need to overhaul that in order to read in multi-label appropriately. @wgpubs, is this correct? And in a previous reply, you mentioned needing to have a column for each possible label, marked with a 0 or 1. However, in past fastai multi-label case studies, we’ve put all the labels into one column. Clarification here would be appreciated. If this is the case, it’d be great if you could edit your original post with some tips on preprocessing, since the imdb notebook assumes your data comes in a particular format.

You don’t need to overhaul it. I’m seeing two scenarios, based on your questions:

  1. The data are in a single csv with one text column, and one (or more) label columns.
  2. The data are in text files, stored in subfolders based on their class.

The workflow in the imdb notebook moves data from scenario 2 to scenario 1. Scenario 1 is the standard. The get_texts function takes a slice of columns as labels, based on the n_labels parameter, and for each row in the dataframe, loops through that number of columns. What you get in the case of a single label is a n-dimensional column vector with the label, where n is the number of data instances (or the batch size, if you’re in a training loop). In the multi-label case, you get a n x m matrix, where n is, again, either the number of data instances or the batch size, and m is the number of labels. The values in this matrix are usually 1 or 0.

Here’s a snippet that shows how you could move from a dataset that looks like this:

To this:

text_labels = [f'Label {n+1}' for n in range(12)]
for index, row in trn_df.iterrows():
    for c in text_labels:
        c_val = row[c]
        if isinstance(c_val, str):
            trn_df.loc[index, c_val] = 1
#             print(c_val)

where c_val tracks the column names seen in the second screenshot. Note the values in the row are 0.0 and 1.0. The 1.0 values are set by trn_df.loc[index, c_val] = 1, while the 0.0 values are set by clean_train_df.fillna(0), since the resulting values there are NaN.

Note that, to fit with the standard set in get_texts, the line clean_train_df = trn_df[['Text'] + classes] should instead be clean_train_df = clean_train_df[classes + ['Text']], which follows the convention: ['labels', 'text'].

To use clean_train_df as the classification dataset in the state I show in the second screenshot, I set n_labels to 27 (the number of labels), which got me a 3900x27-dimensional trn_labels vector.

FYI this is all much easier in fastai v1. Have a look at the docs and examples and let us know how you go.

2 Likes

Alright, I’ll update this thread with my results later. I’m actually just working on a multilabel classification problem, and will write a new notebook in fastai v1 once I’m done with the current phase of fine-tuning and testing.

2 Likes

I’m trying to do multi-label classification with the pretrained language model approach. This is the important parts of my code:

train_ds = TextDataset.from_csv(DATA_PATH, name='train', 
                            classes=classes, n_labels=len(classes))
valid_ds = TextDataset.from_csv(DATA_PATH, name='valid', 
                            classes=classes, n_labels=len(classes))
data_lm = lm_data([train_ds, valid_ds], DATA_PATH)

train_ds = TextDataset.from_csv(DATA_PATH, name='train', vocab=data_lm.train_ds.vocab, 
                                classes=classes, n_labels=len(classes))
valid_ds = TextDataset.from_csv(DATA_PATH, name='valid', vocab=data_lm.train_ds.vocab, 
                                classes=classes, n_labels=len(classes))

data_clas = classifier_data([train_ds, valid_ds], DATA_PATH)

learn = RNNLearner.language_model(data_lm, pretrained_fnames=['lstm_wt103', 'itos_wt103'], drop_mult=0.5)
learn.fit_one_cycle(1, 1e-2)
learn.save_encoder('ft_enc')

learn = RNNLearner.classifier(data_clas, drop_mult=0.5)
learn.load_encoder('ft_enc')
learn.fit_one_cycle(1, 1e-2)

I have a couple of questions regarding the classification part (language modeling works fine):

  1. Which loss function should we choose and where do we specify it?
    I have tried doing learn.loss_fn = F.binary_cross_entropy_with_logits but this function does not like the data type of the targets. Think it wants it in float32 and currently they are in int’s. Should I change this somewhere?

  2. How do I get predictions once the model is trained?
    If I do learn.get_preds() I get the following error:

    ~/fastai/fastai/basic_train.py in loss_batch(model, xb, yb, loss_fn, opt, cb_handler, metrics)
    22 out = model(*xb)
    23 out = cb_handler.on_loss_begin(out)
    —> 24 if not loss_fn: return out.detach(),yb[0].detach()
    25 loss = loss_fn(out, *yb)
    26 mets = [f(out,*yb).detach().cpu() for f in metrics] if metrics is not None else []

    AttributeError: ‘tuple’ object has no attribute ‘detach’

Maybe I’m missing something crucial and then these questions aren’t appropriate but then I would love for someone to point out what I’m doing wrong. I’ve been looking around the docs and examples but still couldn’t solve it.

Thanks in advance.

I’ve been struggling with adding a multi-label classifier (having around 2000 classes) on top of the language model as well. I’m seeing very diverse recommendations, but nothing seems to work.

Here are my main questions:

  • Does the target value need to be one hot encoded or do I just pass the index of the correct target? Certain losses seem to require different target representations. My data is not in csv form and can’t be loaded via prepackaged classes so I need to understand that formats are expected downstream.

  • What “criterion” do I use?
    If I leave it as is (“cross_entropy” is used inside RNN_Learner I think) I get a multi-target error:
    RuntimeError: multi-target not supported at /pytorch/torch/lib/THCUNN/generic/ClassNLLCriterion.cu:16
    If I change it to “binary_cross_entropy” I get an error saying my input and target are different lengths:
    ValueError: Target and input must have the same number of elements. target nelement (48) != input nelement (100368). This seems to imply that one hots are needed since 100368 = my_batch_size x num_classes.

  • The large input number above makes me think that the LinearPoolingClassifier being used in “get_rnn_classifier” is flattening tensors, but I can’t find documentation that clarifies this. Do I need to switch this out with something else?

  • Are there any complete, working examples of using a multi-target classifier on top of the language model? All I have been able to find is binary classifiers (IMDB sentiment, etc).

Thanks!

2 Likes

I was eventually able to get it working by using “nn.CrossEntropyLoss()” for the criterion.
Below is my condensed code which exposes the steps that occur in “get_rnn_classifier()”:

bptt = 70
max_seq_len = bptt*20
embedding_dim = 400
num_hid = 1150
num_layers = 3

#label2idx is a mapping of target label string to index
num_classes = len(label2idx)

#tok2idx is a mapping of token string to index
vocab_size = len(idx2tok)
opt_fn = partial(optim.Adam, betas=(0.7, 0.99))
batch_size = 24
dropouts = np.array([0.4,0.5,0.05,0.3,0.4])*0.5

#train_inputs is an 2D np.array of lists of token indices (num_examples x variable_sequence_len)
#train_targets is an 1D np.array of label indices (num_examples)
trn_ds = TextDataset(train_inputs, train_targets)
val_ds = TextDataset(cv_inputs, cv_targets)

#kind of make it so the first things in the list are, on the whole, shorter than the things at the end, 
# but a little bit random as well
trn_samp = SortishSampler(train_inputs, key=lambda x: len(train_inputs[x]), bs=batch_size)
val_samp = SortSampler(cv_inputs, key=lambda x: len(cv_inputs[x]))

pad_idx = tok2idx['_pad_']
#there's a memory leak in DataLoader, use num_workers=0 to avoid ThreadPoolExecutor
trn_dl = DataLoader(trn_ds, batch_size, transpose=True, num_workers=0, pad_idx=pad_idx, sampler=trn_samp)
val_dl = DataLoader(val_ds, batch_size, transpose=True, num_workers=0, pad_idx=pad_idx, sampler=val_samp)

model_data = ModelData(base_path, trn_dl, val_dl)

#stack a head linear classifier on top of the LM
# in notebook, get_rnn_classifier takes num_classes as arg but does nothing with it
rnn_enc = MultiBatchRNN(bptt, max_seq_len, vocab_size, embedding_dim, num_hid, num_layers, pad_token=pad_idx, bidir=False, 
                        dropouti=dropouts[0], wdrop=dropouts[1], dropoute=dropouts[2], dropouth=dropouts[3])

#why is the input emb*3? concat pooling
# 1. we take the average pooling over the sequence of the activations, 
# 2. the max pooling of the sequence over the activations, 
# 3. and the final set of activations, and just concatenate them all together.
classifier_hid_size = 256
head_layers = [embedding_dim*3, classifier_hid_size, num_classes]
head_dropouts = [dropouts[4], 0.1]
head = PoolingLinearClassifier(head_layers, head_dropouts)

seq_enc = SequentialRNN(rnn_enc, head)
#use gpu if available
seq_enc = to_gpu(seq_enc)

text_model = TextModel(seq_enc)

#create the rnn learner
classifier_learner = RNN_Learner(model_data, text_model, opt_fn=opt_fn)
#multi-label
#This criterion expects a class index (0 to C-1) as the target for each value of a 1D tensor of size minibatch
classifier_learner.crit = nn.CrossEntropyLoss()
classifier_learner.reg_fn = partial(seq2seq_reg, alpha=2, beta=1)
classifier_learner.clip=0.25
classifier_learner.metrics = [accuracy]

#load pretrained encoder
classifier_learner.load_encoder(learner_finetune_encoder_path)

lr=3e-3
lrm = 2.6
lrs = np.array([lr/(lrm**4), lr/(lrm**3), lr/(lrm**2), lr/lrm, lr])
wd = 1e-7

classifier_learner.freeze_to(-1)
classifier_learner.lr_find(lrs/1000)
classifier_learner.save('clas_lr')
classifier_learner.load('clas_lr')
classifier_learner.sched.plot()

#do I need to manually set best lr from plot or is it done automatically in the learner object?
lr = 6e-4 
lrs = np.array([lr/(lrm**4), lr/(lrm**3), lr/(lrm**2), lr/lrm, lr])`

classifier_learner.fit(lrs, 1, wds=wd, cycle_len=1, use_clr=(8,3))
classifier_learner.save('clas_0')
classifier_learner.load('clas_0')
classifier_learner.sched.plot_loss()

#etc

I hope this is helpful.

7 Likes

Thanks for sharing this @William.Collins Are your class labels hierarchical? I wonder how treating hierarchical labels as flat would impact the results.

No, my class labels are flat.

Hi WIlliam,

Do you have a complete notebook available for your Multilabel Text Classification @William.Collins ?

I would appreciate any help you can provide.

Thanks, David

I have just found this thread. We have discussed some of this issues here, using Fastai v1. Maybe you can see something useful there!

1 Like

Thanks for sharing this workaround, @William.Collins!

It also fixes a similar error that I encountered in the 12c_ulmfit.ipynb notebook in the “Deep Learning From the Foundations” course (Part 2 version 3).