Hello @morgan. Well, it was the objective of my work
Yes, I think that a small generative model (like GP-2 small) fine-tuned from English to another language like Portuguese allows to get a working model with a relatively small fine-tuning dataset (I used a bit more than 1 GB from Portuguese Wikipedia).
Well, great if I opened up some research but I’m sure there was already.
About applying the same fine-tuning method on encoder transformer-based models like BERT (RoBERTa, ALBERT, etc.), I’m currently testing my method on your FastHugs code. It works very well (because you did a great work Morgan!) I will publish soon.
By the way, I have 2 questions on your code that I will publish in the FastHugs thread. Many thanks!