Hey @abyaadrafid, two quick pointers:
- You can find pretrained models in Bengali/Bangla at iNLTK here: https://github.com/goru001/inltk
- You can find the
latest
full wiki dump for Bangla using this script: https://github.com/NirantK/bharatNLP/blob/dev/prepare_wiki.sh
From my best guess, working with Hindi and Indonesian wiki dumps - the compressed (*.tar.gz
) file might be small, the extracted full dumps are often larger and can go upto 2-2.5G as we see in archive.