Hi, I’m experimenting transformer-translation for a while.
It works fine but the problem is this example use csv file which I think is not good for large datasets.
How can I prepare dataset which has similar structure to language model training?
like
data/en/1.txt
/2.txt
/3.txt
data/fr/1.txt
/2.txt
/3.txt
Could you give me a hint?
Thanks for advance.