Hi, I’m experimenting transformer-translation for a while.
It works fine but the problem is this example use csv file which I think is not good for large datasets.
How can I prepare dataset which has similar structure to language model training?
data/en/1.txt /2.txt /3.txt data/fr/1.txt /2.txt /3.txt
Could you give me a hint?
Thanks for advance.