ULMFiT last layer activations/values before softmax, for unsupervised text clustering

Hi,

I am working in a unsupervised text clustering project (potentially thousands of groups, no labels) and I want the last layer activations/values for each document (both on train data and future new documents), before the last layer (softmax) so I can use clustering algos with those vectors.

How can I get those vectors?

I have had some success with this approach with vectors from word2vec embeddings, but I guess I will have much more accuracy using the fine tuned ULMIFiT language model.

1 Like

You should just call learn.model[0] on the tensors your want the predictions of. This will return a tuple of three things, the first one is what you need.

Thanks for the quick reply Sebastian.

Ok, but what do I pass to learn.model[0] ?

I am struggling to get the individual items from TextLMDataBunch object, how can I call them?

And how can I get tensor for new sentences/documents?

Thanks a lot!

Hi can you able to get the encoding of new sentence.
Interested to know about this possibility.

Thanks…

Hi, I couldn’t get yet, appreciate any help here.

1 Like

Thanks Aleg…

Trying to get with learn.model[0] … no luck …
Will post if I can able to get anything…

I want to compare ULMFiT with USE …