Using ULMFIT to train on own text data and extract similar words

sugi.lauw · August 21, 2019, 7:23am

Hi All,

Hope you are all very well.

I have project to develop a product to target audience based on contextual moment, eg: Happy moment, sad moments, confused moments, travel moments, adventure moment and so on.

To scale this, I am thinking of using ULMFIT to find similar words to happy, adventure, beach, holiday.

Is there a way to utilize ULMFIT to train on my own small word corpus to try and find a similar word that is related to “adventure” moment?

darek.kleczek · August 21, 2019, 9:19pm

Not sure if I understand the goal - the easiest solution seems to be using a thesaurus, or wordnet. You might also consider using word2vec and find neighbouring words in a pretrained word2vec model. ULMFiT is a language model, so it’s designed to work with sentences or documents, not individual words. It could be used if you wanted to classify sentences into categories you mention, but I’m not sure it would help with individual words…

sugi.lauw · August 22, 2019, 6:35am

Yeah, essentially I want to be able to find neighboring words to travel, eg: adventure, holiday, vacation if its available in my corpus of data

I tried training it on my own small corpus of list of words using GloVe or word2vec or FastText but because i only have small amount of data, it only produced 53 vocab. As such, its giving me error, “travel” is not in vocabulary.

Is it possible to extend the pre-trained model with a new vocab in my small corpus?

Daniel.R.Armstrong · August 22, 2019, 12:29pm

Maybe check out the new nlp course from fastai for inspiration.