I have a conceptual question about ULMFiT. I really like the modular architecture of training a LM, training a model for the desired medium on top of that and finally training the head. Let’s suppose I have done the entire process in one language (German) and my final “product” is a classifier that can detect sentiment in forum posts. First I train a LM from the German Wikipedia, then I train on some scraped German forum posts and then I train on a dataset where I have sentiment for each post. So far so good.
How would I go about changing the medium? I want to generalize this to other media so for example I want to classify sentiment in emails or tweets. The way I see it these are my options:
- Only reuse the LM and restart at step 2 (train with emails, train the head on top). This is doable but it feels like some knowledge is “wasted”
- Just apply the head from the forum posts to the new medium and see how it works (seems reasonable)
- Somehow “transfer” the head
So far I’m stuck using 1 or 2 but conceptually I’d like a scenario where I only retrain the entire classifier on some new unlabeled data for the medium and boom it works. Is this feasible conceptually?
I dunno if I explained what I mean well. Interested in hearing all opinions/thoughts
Edit: fixed typo, was “1 or 3” but I meant “1 or 2”