Bert + fast ai

Hello, I want to use Bert on Arabic dataset, so i want to train the language model on Arabic i don’t need it to be pre-trained.
I uses this line of code for a seq2seq model:
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.2, pretrained= False)

with pretrained= False, and it gave me good results
how can i do it for a Bert model please?

Training a BERT language model from scratch is unfortunately not so straightfoward. With the standard fastai language_model_learner you could use Transformer or Transformer-XL architectures though. See docs:

If you want to train BERT from scratch probably the best starting point would be the huggingface library:

1 Like

The “fasthugs_language_model” notebook here will let you train BERT:

Blurr is also a full library that enables fastai + HuggingFace integration:

It seems like someone has loaded a pre-trained Arabic BERT on HuggingFace’s models repo too: , to start I’d recommend fine-tuning with this dataset rather than trying to train a BERT model from scratch


Oh, I hadn’t noticed you added a language model to fasthugs. Awesome work! Will try it out myself.

1 Like

Thank you, that’s really helpful.

1 Like

Thank you, i will start by testing the Transformer-XL instead of AWD_LSTM, and see if there is any improvement for the sentiment analysis.