Hello, I want to use Bert on Arabic dataset, so i want to train the language model on Arabic i don’t need it to be pre-trained.
I uses this line of code for a seq2seq model:
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.2, pretrained= False)
with pretrained= False, and it gave me good results
how can i do it for a Bert model please?
Training a BERT language model from scratch is unfortunately not so straightfoward. With the standard fastai language_model_learner you could use Transformer or Transformer-XL architectures though. See docs: https://docs.fast.ai/text.learner.html#language_model_learner
It seems like someone has loaded a pre-trained Arabic BERT on HuggingFace’s models repo too: https://huggingface.co/asafaya/bert-base-arabic , to start I’d recommend fine-tuning with this dataset rather than trying to train a BERT model from scratch