Does anyone know an easy way to get the encoding of a new document with a Language Model trained in ULMFiT?
learn.predict() outputs the last layer/score, but how to I extract the encoding before that?
I have tried to run input through just the encoder, but it is not working at all. Here is what my code looks like:
#load data val_sent = np.load('match_lm_data/tmp/val_ids.npy') val_lbls = np.load('match_lm_data/tmp/lbl_val.npy') val_ds = TextDataset(val_sent, val_lbls) val_samp = SortSampler(val_sent, key=lambda x: len(val_sent[x])) val_lbls_sampled = val_lbls[list(val_samp)] val_dl = DataLoader(val_ds, bs, transpose=True, num_workers=1, pad_idx=1, sampler=val_samp) md = ModelData('match_lm_data', None, val_dl) m = get_rnn_classifer(bptt, 20*70, c, vs, emb_sz=em_sz, n_hid=nh, n_layers=nl, pad_token=1, layers=[em_sz*3, 50, c], drops=[0., 0.]) m2 = m #take just encoder learn = RNN_Learner(md, TextModel(to_gpu(m2))) b.learn.model((next(iter(val_dl)))
When I do this, it gives me the following error: "AttributeError: ‘MultiBatchRNN’ object has no attribute ‘hidden’.
How do I solve this? Is there any easier/more elegant way to extract the encoding?