@jeremy said that there are recently some papers using the pre-trained language model to do classification, does anyone know what are those papers? can someone post links to those papers?
I’m guessing the main paper he’s referring to is: https://arxiv.org/pdf/1710.02076v1.pdf
I also saw it recently discussed here: https://arxiv.org/pdf/1711.05732v1.pdf
Most of the recent embedding papers now include some examination of a few NLP tasks to compare their performance.
Here’s another fun paper, where one of the tasks they evaluate on is a text classification task
Here’s another, although based on Jeremy’s comment in another thread I’m guessing I might have missed the mark as these papers relate to embeddings and he’s training the network and using that as the input.
I linked to the main paper that uses a full pretrained model in the notebook https://arxiv.org/pdf/1708.00107.pdf . There are some more thoughts and links here http://ruder.io/transfer-learning/index.html