Transfer Learning suitable for large Datasets?

As far as I have understood the use of Transfer Learning for NLP has two main benefits:

  • general aspects of e.g. the english language do not have to be learned from scratch
  • far less data is necessary to train a model for a specific task

Especially regarding the second point I am asing myself, whether it makes sense to use sth. like BERT, ULMFit, XLNet if a large dataset is available. I want to perform text classification (binary) and have about 2 million samples.
If I fine-tune a Language Model (LM) with this high number of samples, does this not destroy the already learned weights of the LM?

Thank you in Advance!

I cannot think of any disadvantage to use transfer learning, you alternative is to start with random weights. Starting with some pre-trained weights is always better than random.

Sure, you dataset might be enough to completely train the model from scratch, but starting from a pretrained model would mean that (possibly) you would get there faster.

Maybe by using TL you don’t even need to train on your full dataset, possibly saving you a ton of time.

1 Like