ULMFIT - Kannada

disisbig · March 20, 2019, 3:15pm

Starting this thread to share the progress on the Kannada LM and classification results @piotr.czapla @Moody

Dataset

Download Kannada Wikipedia Articles Dataset (32,997 articles) which I scraped, cleaned and used to train the language model
Download Kannada News classification Dataset which I scraped and used to train the classifier

on 20% validation set

Download pretrained Language Model from here

Download classifier from here

Trained tokenizer using Google’s sentencepiece

Download the trained model and vocabulary from here

krash · July 26, 2019, 5:45am

Sakkath macha !