Training Bangla LM from wikipedia data

Hey @abyaadrafid, two quick pointers:

From my best guess, working with Hindi and Indonesian wiki dumps - the compressed (*.tar.gz) file might be small, the extracted full dumps are often larger and can go upto 2-2.5G as we see in archive.

1 Like