In the NLP course, a couple of different word embeddings are used. In the attention model, pretrained vectors from fasttext are used. In transformers, it is stated that we used “traditional pytorch embeddings”, although pretrained vectors can also be used.
As I branch out to other transformers models and NLP libraries out there, it seems like BPE is commonly seen as the most effective, as least for neural translations. Is that right?