IIUC embeddings like in fasttext can be seen as a dump of a representation of a hidden layer in a neural network. But since the lookup of these embeddings is rather simple to do, the neural network’s model architecture that generated them isn’t really needed. VGG/ULMFit’s hidden layers are more complex and hence it makes sense to include the model that generated them, also giving you the option of refining the pre-made model.
So does it make sense to call VGG/ULMFit non-linear embeddings? (And then one could say: the prerequisite for transfer learning is having linear or non-linear embeddings.)