Are you looking for a pre-trained model or you’d like to train one from scratch?
Pre-trained
I see HuggingFace have a community-submitted Italian BERT model here that you could try use: https://huggingface.co/models?search=italian
See below for how to use it with fastai2
From Scratch
Pre-train data:
You can have a look at the scripts here to download all italian wikipedia articles: https://github.com/fastai/fastai/tree/0a6f3894cd4881c0f4799d8f7533d20c6077a0dc/courses/dl2/imdb_scripts
And then you can consider whether to use the AWD_LSTM model or a transformer:
AWD_LSTM
Fastai wikitext tutorial using AWD_LSTM to pre-train a language model and fine-tune for classification:
Transformer options
My FastHugs notebooks: https://github.com/morganmcg1/fasthugs
- First use the language model notebook to pre-train, then use the sequence classification model to do classification
@Richard-Wang has also done pre-train and fine-tuning of transformers here: Pretrain MLM and fintune on GLUE with fastai - 1 - Masked laguage model callback and Electra callback
@wgpubs recently released a library to use HuggingFace transformers, although as of writing I don’t think you can pre-train with it yet, but the classification element should work https://ohmeow.github.io/blurr/
Sylvain also released a Fastai transformers tutorial, but right now it only covers text generation, but worth a look to see how he integrates HF and fastai: http://dev.fast.ai/tutorial.transformers
One disadvantage to training from scratch with transformers is that the impressive results they have gotten has been due to using really huge amounts of data and take a long time to pre-train, so I would either start with a pre-trained transformer model or pre-train an AWD_LSTM
Other Italian models
I found this thread from fastai v1 which is worth a look too: ULMFit - Italian - v1