Conceptual question about ULMFiT (retraining for new medium)


(Kristian Rother) #1

I have a conceptual question about ULMFiT. I really like the modular architecture of training a LM, training a model for the desired medium on top of that and finally training the head. Let’s suppose I have done the entire process in one language (German) and my final “product” is a classifier that can detect sentiment in forum posts. First I train a LM from the German Wikipedia, then I train on some scraped German forum posts and then I train on a dataset where I have sentiment for each post. So far so good.

How would I go about changing the medium? I want to generalize this to other media so for example I want to classify sentiment in emails or tweets. The way I see it these are my options:

  1. Only reuse the LM and restart at step 2 (train with emails, train the head on top). This is doable but it feels like some knowledge is “wasted”
  2. Just apply the head from the forum posts to the new medium and see how it works (seems reasonable)
  3. Somehow “transfer” the head

So far I’m stuck using 1 or 2 but conceptually I’d like a scenario where I only retrain the entire classifier on some new unlabeled data for the medium and boom it works. Is this feasible conceptually?

I dunno if I explained what I mean well. Interested in hearing all opinions/thoughts :slight_smile:

Edit: fixed typo, was “1 or 3” but I meant “1 or 2”


(Matthew Teschke) #2

If I’m understanding you correctly, you’d want to retrain the LM on your new corpus - that is where a lot of the power of ULMFiT comes from. Language will likely be used differently across different sources, so the model needs to be updated for that before training the classifier.

So I think 1 is your best approach - that’s generally the approach I’ve been taking.