Multilabel Classification with ULMFiT

William.Collins · November 1, 2018, 1:41pm

I was eventually able to get it working by using “nn.CrossEntropyLoss()” for the criterion.
Below is my condensed code which exposes the steps that occur in “get_rnn_classifier()”:

bptt = 70
max_seq_len = bptt*20
embedding_dim = 400
num_hid = 1150
num_layers = 3

#label2idx is a mapping of target label string to index
num_classes = len(label2idx)

#tok2idx is a mapping of token string to index
vocab_size = len(idx2tok)
opt_fn = partial(optim.Adam, betas=(0.7, 0.99))
batch_size = 24
dropouts = np.array([0.4,0.5,0.05,0.3,0.4])*0.5

#train_inputs is an 2D np.array of lists of token indices (num_examples x variable_sequence_len)
#train_targets is an 1D np.array of label indices (num_examples)
trn_ds = TextDataset(train_inputs, train_targets)
val_ds = TextDataset(cv_inputs, cv_targets)

#kind of make it so the first things in the list are, on the whole, shorter than the things at the end, 
# but a little bit random as well
trn_samp = SortishSampler(train_inputs, key=lambda x: len(train_inputs[x]), bs=batch_size)
val_samp = SortSampler(cv_inputs, key=lambda x: len(cv_inputs[x]))

pad_idx = tok2idx['_pad_']
#there's a memory leak in DataLoader, use num_workers=0 to avoid ThreadPoolExecutor
trn_dl = DataLoader(trn_ds, batch_size, transpose=True, num_workers=0, pad_idx=pad_idx, sampler=trn_samp)
val_dl = DataLoader(val_ds, batch_size, transpose=True, num_workers=0, pad_idx=pad_idx, sampler=val_samp)

model_data = ModelData(base_path, trn_dl, val_dl)

#stack a head linear classifier on top of the LM
# in notebook, get_rnn_classifier takes num_classes as arg but does nothing with it
rnn_enc = MultiBatchRNN(bptt, max_seq_len, vocab_size, embedding_dim, num_hid, num_layers, pad_token=pad_idx, bidir=False, 
                        dropouti=dropouts[0], wdrop=dropouts[1], dropoute=dropouts[2], dropouth=dropouts[3])

#why is the input emb*3? concat pooling
# 1. we take the average pooling over the sequence of the activations, 
# 2. the max pooling of the sequence over the activations, 
# 3. and the final set of activations, and just concatenate them all together.
classifier_hid_size = 256
head_layers = [embedding_dim*3, classifier_hid_size, num_classes]
head_dropouts = [dropouts[4], 0.1]
head = PoolingLinearClassifier(head_layers, head_dropouts)

seq_enc = SequentialRNN(rnn_enc, head)
#use gpu if available
seq_enc = to_gpu(seq_enc)

text_model = TextModel(seq_enc)

#create the rnn learner
classifier_learner = RNN_Learner(model_data, text_model, opt_fn=opt_fn)
#multi-label
#This criterion expects a class index (0 to C-1) as the target for each value of a 1D tensor of size minibatch
classifier_learner.crit = nn.CrossEntropyLoss()
classifier_learner.reg_fn = partial(seq2seq_reg, alpha=2, beta=1)
classifier_learner.clip=0.25
classifier_learner.metrics = [accuracy]

#load pretrained encoder
classifier_learner.load_encoder(learner_finetune_encoder_path)

lr=3e-3
lrm = 2.6
lrs = np.array([lr/(lrm**4), lr/(lrm**3), lr/(lrm**2), lr/lrm, lr])
wd = 1e-7

classifier_learner.freeze_to(-1)
classifier_learner.lr_find(lrs/1000)
classifier_learner.save('clas_lr')
classifier_learner.load('clas_lr')
classifier_learner.sched.plot()

#do I need to manually set best lr from plot or is it done automatically in the learner object?
lr = 6e-4 
lrs = np.array([lr/(lrm**4), lr/(lrm**3), lr/(lrm**2), lr/lrm, lr])`

classifier_learner.fit(lrs, 1, wds=wd, cycle_len=1, use_clr=(8,3))
classifier_learner.save('clas_0')
classifier_learner.load('clas_0')
classifier_learner.sched.plot_loss()

#etc

I hope this is helpful.