Does anyone know an easy way to get the encoding of a new document with a Language Model trained in ULMFiT?
learn.predict() outputs the last layer/score, but how to I extract the encoding before that?
I have tried to run input through just the encoder, but it is not working at all. Here is what my code looks like:
#load data
val_sent = np.load('match_lm_data/tmp/val_ids.npy')
val_lbls = np.load('match_lm_data/tmp/lbl_val.npy')
val_ds = TextDataset(val_sent, val_lbls)
val_samp = SortSampler(val_sent, key=lambda x: len(val_sent[x]))
val_lbls_sampled = val_lbls[list(val_samp)]
val_dl = DataLoader(val_ds, bs, transpose=True, num_workers=1, pad_idx=1, sampler=val_samp)
md = ModelData('match_lm_data', None, val_dl)
m = get_rnn_classifer(bptt, 20*70, c, vs, emb_sz=em_sz, n_hid=nh, n_layers=nl, pad_token=1,
layers=[em_sz*3, 50, c], drops=[0., 0.])
m2 = m[0] #take just encoder
learn = RNN_Learner(md, TextModel(to_gpu(m2)))
b.learn.model((next(iter(val_dl))[0])
When I do this, it gives me the following error: "AttributeError: ‘MultiBatchRNN’ object has no attribute ‘hidden’.
How do I solve this? Is there any easier/more elegant way to extract the encoding?