Multilabel Classification with ULMFiT


(Brian Muhia) #21

Alright, I’ll update this thread with my results later. I’m actually just working on a multilabel classification problem, and will write a new notebook in fastai v1 once I’m done with the current phase of fine-tuning and testing.


(Jan) #22

I’m trying to do multi-label classification with the pretrained language model approach. This is the important parts of my code:

train_ds = TextDataset.from_csv(DATA_PATH, name='train', 
                            classes=classes, n_labels=len(classes))
valid_ds = TextDataset.from_csv(DATA_PATH, name='valid', 
                            classes=classes, n_labels=len(classes))
data_lm = lm_data([train_ds, valid_ds], DATA_PATH)

train_ds = TextDataset.from_csv(DATA_PATH, name='train', vocab=data_lm.train_ds.vocab, 
                                classes=classes, n_labels=len(classes))
valid_ds = TextDataset.from_csv(DATA_PATH, name='valid', vocab=data_lm.train_ds.vocab, 
                                classes=classes, n_labels=len(classes))

data_clas = classifier_data([train_ds, valid_ds], DATA_PATH)

learn = RNNLearner.language_model(data_lm, pretrained_fnames=['lstm_wt103', 'itos_wt103'], drop_mult=0.5)
learn.fit_one_cycle(1, 1e-2)
learn.save_encoder('ft_enc')

learn = RNNLearner.classifier(data_clas, drop_mult=0.5)
learn.load_encoder('ft_enc')
learn.fit_one_cycle(1, 1e-2)

I have a couple of questions regarding the classification part (language modeling works fine):

  1. Which loss function should we choose and where do we specify it?
    I have tried doing learn.loss_fn = F.binary_cross_entropy_with_logits but this function does not like the data type of the targets. Think it wants it in float32 and currently they are in int’s. Should I change this somewhere?

  2. How do I get predictions once the model is trained?
    If I do learn.get_preds() I get the following error:

    ~/fastai/fastai/basic_train.py in loss_batch(model, xb, yb, loss_fn, opt, cb_handler, metrics)
    22 out = model(*xb)
    23 out = cb_handler.on_loss_begin(out)
    —> 24 if not loss_fn: return out.detach(),yb[0].detach()
    25 loss = loss_fn(out, *yb)
    26 mets = [f(out,*yb).detach().cpu() for f in metrics] if metrics is not None else []

    AttributeError: ‘tuple’ object has no attribute ‘detach’

Maybe I’m missing something crucial and then these questions aren’t appropriate but then I would love for someone to point out what I’m doing wrong. I’ve been looking around the docs and examples but still couldn’t solve it.

Thanks in advance.


(William Collins) #23

I’ve been struggling with adding a multi-label classifier (having around 2000 classes) on top of the language model as well. I’m seeing very diverse recommendations, but nothing seems to work.

Here are my main questions:

  • Does the target value need to be one hot encoded or do I just pass the index of the correct target? Certain losses seem to require different target representations. My data is not in csv form and can’t be loaded via prepackaged classes so I need to understand that formats are expected downstream.

  • What “criterion” do I use?
    If I leave it as is (“cross_entropy” is used inside RNN_Learner I think) I get a multi-target error:
    RuntimeError: multi-target not supported at /pytorch/torch/lib/THCUNN/generic/ClassNLLCriterion.cu:16
    If I change it to “binary_cross_entropy” I get an error saying my input and target are different lengths:
    ValueError: Target and input must have the same number of elements. target nelement (48) != input nelement (100368). This seems to imply that one hots are needed since 100368 = my_batch_size x num_classes.

  • The large input number above makes me think that the LinearPoolingClassifier being used in “get_rnn_classifier” is flattening tensors, but I can’t find documentation that clarifies this. Do I need to switch this out with something else?

  • Are there any complete, working examples of using a multi-target classifier on top of the language model? All I have been able to find is binary classifiers (IMDB sentiment, etc).

Thanks!


(William Collins) #24

I was eventually able to get it working by using “nn.CrossEntropyLoss()” for the criterion.
Below is my condensed code which exposes the steps that occur in “get_rnn_classifier()”:

bptt = 70
max_seq_len = bptt*20
embedding_dim = 400
num_hid = 1150
num_layers = 3

#label2idx is a mapping of target label string to index
num_classes = len(label2idx)

#tok2idx is a mapping of token string to index
vocab_size = len(idx2tok)
opt_fn = partial(optim.Adam, betas=(0.7, 0.99))
batch_size = 24
dropouts = np.array([0.4,0.5,0.05,0.3,0.4])*0.5

#train_inputs is an 2D np.array of lists of token indices (num_examples x variable_sequence_len)
#train_targets is an 1D np.array of label indices (num_examples)
trn_ds = TextDataset(train_inputs, train_targets)
val_ds = TextDataset(cv_inputs, cv_targets)

#kind of make it so the first things in the list are, on the whole, shorter than the things at the end, 
# but a little bit random as well
trn_samp = SortishSampler(train_inputs, key=lambda x: len(train_inputs[x]), bs=batch_size)
val_samp = SortSampler(cv_inputs, key=lambda x: len(cv_inputs[x]))

pad_idx = tok2idx['_pad_']
#there's a memory leak in DataLoader, use num_workers=0 to avoid ThreadPoolExecutor
trn_dl = DataLoader(trn_ds, batch_size, transpose=True, num_workers=0, pad_idx=pad_idx, sampler=trn_samp)
val_dl = DataLoader(val_ds, batch_size, transpose=True, num_workers=0, pad_idx=pad_idx, sampler=val_samp)

model_data = ModelData(base_path, trn_dl, val_dl)

#stack a head linear classifier on top of the LM
# in notebook, get_rnn_classifier takes num_classes as arg but does nothing with it
rnn_enc = MultiBatchRNN(bptt, max_seq_len, vocab_size, embedding_dim, num_hid, num_layers, pad_token=pad_idx, bidir=False, 
                        dropouti=dropouts[0], wdrop=dropouts[1], dropoute=dropouts[2], dropouth=dropouts[3])

#why is the input emb*3? concat pooling
# 1. we take the average pooling over the sequence of the activations, 
# 2. the max pooling of the sequence over the activations, 
# 3. and the final set of activations, and just concatenate them all together.
classifier_hid_size = 256
head_layers = [embedding_dim*3, classifier_hid_size, num_classes]
head_dropouts = [dropouts[4], 0.1]
head = PoolingLinearClassifier(head_layers, head_dropouts)

seq_enc = SequentialRNN(rnn_enc, head)
#use gpu if available
seq_enc = to_gpu(seq_enc)

text_model = TextModel(seq_enc)

#create the rnn learner
classifier_learner = RNN_Learner(model_data, text_model, opt_fn=opt_fn)
#multi-label
#This criterion expects a class index (0 to C-1) as the target for each value of a 1D tensor of size minibatch
classifier_learner.crit = nn.CrossEntropyLoss()
classifier_learner.reg_fn = partial(seq2seq_reg, alpha=2, beta=1)
classifier_learner.clip=0.25
classifier_learner.metrics = [accuracy]

#load pretrained encoder
classifier_learner.load_encoder(learner_finetune_encoder_path)

lr=3e-3
lrm = 2.6
lrs = np.array([lr/(lrm**4), lr/(lrm**3), lr/(lrm**2), lr/lrm, lr])
wd = 1e-7

classifier_learner.freeze_to(-1)
classifier_learner.lr_find(lrs/1000)
classifier_learner.save('clas_lr')
classifier_learner.load('clas_lr')
classifier_learner.sched.plot()

#do I need to manually set best lr from plot or is it done automatically in the learner object?
lr = 6e-4 
lrs = np.array([lr/(lrm**4), lr/(lrm**3), lr/(lrm**2), lr/lrm, lr])`

classifier_learner.fit(lrs, 1, wds=wd, cycle_len=1, use_clr=(8,3))
classifier_learner.save('clas_0')
classifier_learner.load('clas_0')
classifier_learner.sched.plot_loss()

#etc

I hope this is helpful.


(Xu Fei) #25

Thanks for sharing this @William.Collins Are your class labels hierarchical? I wonder how treating hierarchical labels as flat would impact the results.


(William Collins) #26

No, my class labels are flat.


(David Carroll) #27

Hi WIlliam,

Do you have a complete notebook available for your Multilabel Text Classification @William.Collins ?

I would appreciate any help you can provide.

Thanks, David