ULMFiT last layer activations/values before softmax, for unsupervised text clustering

aleg · June 11, 2019, 5:01pm

Hi,

I am working in a unsupervised text clustering project (potentially thousands of groups, no labels) and I want the last layer activations/values for each document (both on train data and future new documents), before the last layer (softmax) so I can use clustering algos with those vectors.

How can I get those vectors?

I have had some success with this approach with vectors from word2vec embeddings, but I guess I will have much more accuracy using the fine tuned ULMIFiT language model.

sgugger · June 11, 2019, 8:35pm

You should just call learn.model[0] on the tensors your want the predictions of. This will return a tuple of three things, the first one is what you need.

aleg · June 12, 2019, 1:11pm

Thanks for the quick reply Sebastian.

Ok, but what do I pass to learn.model[0] ?

I am struggling to get the individual items from TextLMDataBunch object, how can I call them?

And how can I get tensor for new sentences/documents?

Thanks a lot!

vivek1may · June 28, 2019, 7:41am

Hi can you able to get the encoding of new sentence.
Interested to know about this possibility.

Thanks…

aleg · June 28, 2019, 3:57pm

Hi, I couldn’t get yet, appreciate any help here.

vivek1may · June 30, 2019, 3:56pm

Thanks Aleg…

Trying to get with learn.model[0] … no luck …
Will post if I can able to get anything…

I want to compare ULMFiT with USE …