Alright, I’ll update this thread with my results later. I’m actually just working on a multilabel classification problem, and will write a new notebook in fastai v1 once I’m done with the current phase of fine-tuning and testing.
I’m trying to do multi-label classification with the pretrained language model approach. This is the important parts of my code:
train_ds = TextDataset.from_csv(DATA_PATH, name='train', classes=classes, n_labels=len(classes)) valid_ds = TextDataset.from_csv(DATA_PATH, name='valid', classes=classes, n_labels=len(classes)) data_lm = lm_data([train_ds, valid_ds], DATA_PATH) train_ds = TextDataset.from_csv(DATA_PATH, name='train', vocab=data_lm.train_ds.vocab, classes=classes, n_labels=len(classes)) valid_ds = TextDataset.from_csv(DATA_PATH, name='valid', vocab=data_lm.train_ds.vocab, classes=classes, n_labels=len(classes)) data_clas = classifier_data([train_ds, valid_ds], DATA_PATH) learn = RNNLearner.language_model(data_lm, pretrained_fnames=['lstm_wt103', 'itos_wt103'], drop_mult=0.5) learn.fit_one_cycle(1, 1e-2) learn.save_encoder('ft_enc') learn = RNNLearner.classifier(data_clas, drop_mult=0.5) learn.load_encoder('ft_enc') learn.fit_one_cycle(1, 1e-2)
I have a couple of questions regarding the classification part (language modeling works fine):
Which loss function should we choose and where do we specify it?
I have tried doing
learn.loss_fn = F.binary_cross_entropy_with_logitsbut this function does not like the data type of the targets. Think it wants it in float32 and currently they are in int’s. Should I change this somewhere?
How do I get predictions once the model is trained?
If I do
learn.get_preds()I get the following error:
~/fastai/fastai/basic_train.py in loss_batch(model, xb, yb, loss_fn, opt, cb_handler, metrics)
22 out = model(*xb)
23 out = cb_handler.on_loss_begin(out)
—> 24 if not loss_fn: return out.detach(),yb.detach()
25 loss = loss_fn(out, *yb)
26 mets = [f(out,*yb).detach().cpu() for f in metrics] if metrics is not None else 
AttributeError: ‘tuple’ object has no attribute ‘detach’
Maybe I’m missing something crucial and then these questions aren’t appropriate but then I would love for someone to point out what I’m doing wrong. I’ve been looking around the docs and examples but still couldn’t solve it.
Thanks in advance.
I’ve been struggling with adding a multi-label classifier (having around 2000 classes) on top of the language model as well. I’m seeing very diverse recommendations, but nothing seems to work.
Here are my main questions:
Does the target value need to be one hot encoded or do I just pass the index of the correct target? Certain losses seem to require different target representations. My data is not in csv form and can’t be loaded via prepackaged classes so I need to understand that formats are expected downstream.
What “criterion” do I use?
If I leave it as is (“cross_entropy” is used inside RNN_Learner I think) I get a multi-target error:
RuntimeError: multi-target not supported at /pytorch/torch/lib/THCUNN/generic/ClassNLLCriterion.cu:16
If I change it to “binary_cross_entropy” I get an error saying my input and target are different lengths:
ValueError: Target and input must have the same number of elements. target nelement (48) != input nelement (100368). This seems to imply that one hots are needed since 100368 = my_batch_size x num_classes.
The large input number above makes me think that the LinearPoolingClassifier being used in “get_rnn_classifier” is flattening tensors, but I can’t find documentation that clarifies this. Do I need to switch this out with something else?
Are there any complete, working examples of using a multi-target classifier on top of the language model? All I have been able to find is binary classifiers (IMDB sentiment, etc).
I was eventually able to get it working by using “nn.CrossEntropyLoss()” for the criterion.
Below is my condensed code which exposes the steps that occur in “get_rnn_classifier()”:
bptt = 70 max_seq_len = bptt*20 embedding_dim = 400 num_hid = 1150 num_layers = 3 #label2idx is a mapping of target label string to index num_classes = len(label2idx) #tok2idx is a mapping of token string to index vocab_size = len(idx2tok) opt_fn = partial(optim.Adam, betas=(0.7, 0.99)) batch_size = 24 dropouts = np.array([0.4,0.5,0.05,0.3,0.4])*0.5 #train_inputs is an 2D np.array of lists of token indices (num_examples x variable_sequence_len) #train_targets is an 1D np.array of label indices (num_examples) trn_ds = TextDataset(train_inputs, train_targets) val_ds = TextDataset(cv_inputs, cv_targets) #kind of make it so the first things in the list are, on the whole, shorter than the things at the end, # but a little bit random as well trn_samp = SortishSampler(train_inputs, key=lambda x: len(train_inputs[x]), bs=batch_size) val_samp = SortSampler(cv_inputs, key=lambda x: len(cv_inputs[x])) pad_idx = tok2idx['_pad_'] #there's a memory leak in DataLoader, use num_workers=0 to avoid ThreadPoolExecutor trn_dl = DataLoader(trn_ds, batch_size, transpose=True, num_workers=0, pad_idx=pad_idx, sampler=trn_samp) val_dl = DataLoader(val_ds, batch_size, transpose=True, num_workers=0, pad_idx=pad_idx, sampler=val_samp) model_data = ModelData(base_path, trn_dl, val_dl) #stack a head linear classifier on top of the LM # in notebook, get_rnn_classifier takes num_classes as arg but does nothing with it rnn_enc = MultiBatchRNN(bptt, max_seq_len, vocab_size, embedding_dim, num_hid, num_layers, pad_token=pad_idx, bidir=False, dropouti=dropouts, wdrop=dropouts, dropoute=dropouts, dropouth=dropouts) #why is the input emb*3? concat pooling # 1. we take the average pooling over the sequence of the activations, # 2. the max pooling of the sequence over the activations, # 3. and the final set of activations, and just concatenate them all together. classifier_hid_size = 256 head_layers = [embedding_dim*3, classifier_hid_size, num_classes] head_dropouts = [dropouts, 0.1] head = PoolingLinearClassifier(head_layers, head_dropouts) seq_enc = SequentialRNN(rnn_enc, head) #use gpu if available seq_enc = to_gpu(seq_enc) text_model = TextModel(seq_enc) #create the rnn learner classifier_learner = RNN_Learner(model_data, text_model, opt_fn=opt_fn) #multi-label #This criterion expects a class index (0 to C-1) as the target for each value of a 1D tensor of size minibatch classifier_learner.crit = nn.CrossEntropyLoss() classifier_learner.reg_fn = partial(seq2seq_reg, alpha=2, beta=1) classifier_learner.clip=0.25 classifier_learner.metrics = [accuracy] #load pretrained encoder classifier_learner.load_encoder(learner_finetune_encoder_path) lr=3e-3 lrm = 2.6 lrs = np.array([lr/(lrm**4), lr/(lrm**3), lr/(lrm**2), lr/lrm, lr]) wd = 1e-7 classifier_learner.freeze_to(-1) classifier_learner.lr_find(lrs/1000) classifier_learner.save('clas_lr') classifier_learner.load('clas_lr') classifier_learner.sched.plot() #do I need to manually set best lr from plot or is it done automatically in the learner object? lr = 6e-4 lrs = np.array([lr/(lrm**4), lr/(lrm**3), lr/(lrm**2), lr/lrm, lr])` classifier_learner.fit(lrs, 1, wds=wd, cycle_len=1, use_clr=(8,3)) classifier_learner.save('clas_0') classifier_learner.load('clas_0') classifier_learner.sched.plot_loss() #etc
I hope this is helpful.
Thanks for sharing this @William.Collins Are your class labels hierarchical? I wonder how treating hierarchical labels as flat would impact the results.
No, my class labels are flat.