ULMFiT last layer activations/values before softmax, for unsupervised text clustering



I am working in a unsupervised text clustering project (potentially thousands of groups, no labels) and I want the last layer activations/values for each document (both on train data and future new documents), before the last layer (softmax) so I can use clustering algos with those vectors.

How can I get those vectors?

I have had some success with this approach with vectors from word2vec embeddings, but I guess I will have much more accuracy using the fine tuned ULMIFiT language model.



You should just call learn.model[0] on the tensors your want the predictions of. This will return a tuple of three things, the first one is what you need.



Thanks for the quick reply Sebastian.

Ok, but what do I pass to learn.model[0] ?

I am struggling to get the individual items from TextLMDataBunch object, how can I call them?

And how can I get tensor for new sentences/documents?

Thanks a lot!