ULMFiT to Generate Embeddings

Can we use ULMFiT to generate vectors from text? I am assuming we can, but I don’t quite capture how one can do it. Any idea?

You can I think. I’ll have to go through the paper again but Ulmfit was designed to function differently. Basically its a language model trained to be a classifier. So you train a generic language model on your own text from which you want to do classification but you train it like a language model. And you use the last state and the mean of the previous hidden states (you might want to go through lesson 12 of the new deep learning course for this) and then pass it to a classifier. You’re making use of the internal hidden states of the lstm rather than take the vectors of the words. This is actually a pretty genius idea in my opinion as lstm states store a lot of useful information and it basically learns the combination of words rather than just the word meanings itself. Hope it answers your question :slight_smile:

1 Like

Not necessarily for classification. ULMFiT’s LM can be used for a variety of things.
I know It makes its own word embeddings, but I am struggling to find a way to extract them.

I am following this article

In section 2.2, they go over a pass through the language model. You can extract embeddings before the prediction step.