FastHugs - fastai-v2 and HuggingFace Transformers

I put together a notebook to finetune the BERT, ALBERT, DistilBERT and RoBERTa transformer models from HuggingFace for text classification using fastai-v2.

Code here:

Jupyter-blog post here: (first time using fast_template, I love this jupyter to post feature!)

Things You Might Like (:heart: ?)

FastHugsTokenizer: A tokenizer wrapper than can be used with fastai-v2’s tokenizer.

FastHugsModel: A model wrapper over the HF models, more or less the same to the wrapper’s from HF fastai-v1 articles mentioned below

Vocab: A function to extract the vocab depending on the pre-trained transformer (HF hasn’t standardised this processes :cry:).

Padding: Padding settings for the padding token index and on whether the transformer prefers left or right padding

Vocab for Albert-base-v2 : .json for Albert-base-v2’s vocab, otherwise this has to be extracted from a SentencePiece model file, which isn’t fun

Model Splitters: Functions to split the classification head from the model backbone in line with fastai-v2’s new definition of Learner

This notebooks heavily borrows from this notebook , which in turn is based off of this tutorial and accompanying article. Huge thanks to Melissa Rajaram and Maximilien Roberti for these great resources, if you’re not familiar with the HuggingFace library please given them a read first as they are quite comprehensive.


Now added to fastai’s unofficial extension repository :wink:


Thanks! :smiley:

This is great! Note that I added the rules in the factory methods of Tokenizer so the tempTokenizer class should not be needed anymore.


Great news, I’ll take it out!

One small change needed still I think:

Thanks for creating this notebook, I find it quite helpful!
Based on the course, with an AWD_LSTM model you can achieve around 94% accuracy (albeit with a bit more fine tuning) on the IMDB sentiment analysis task. Any idea why we only get 86% accuracy (in the fasthugs_demo.ipynb notebook)? I thought transformers were supposed to be state of the art so I’m a bit surprised.

1 Like

They don’t work as well for classification in general, from what we found


To check Sylvain comment, I tried to modify Morgan’s excellent notebook -

Instead of using a Classifier model, I want to import a BERT LM to fastai2 , and do some text predicitons.

The tokenization works fine, and the following command also. BTW this is the only Learner that I succeeded to make it work.
learn = LMLearner(dls, fasthugs_model_lm, loss_func=CrossEntropyLossFlat(), splitter=splitter,cbs=cbs)

However , I get a weird error message in the loss, input/target sizes. I am still struggling with the code, but I will appreciate if somebody has already succeeded.

@morgan this may clean up the class selection section of your code:

This looks great, huggingface are nailing it these days!

Can you share all your code? What task are you trying to do? text classification? text generation? other?

My goal is to use a trained roberta/huggingface language model for masked language prediction / next word prediction in a fastai environment.

I want to import a trained model in a large corpus, use as it is in fastai2, also as language mode. And make predictions using learn.predict.

In the last step,I would like to modify the model head to make classification of other task. Since I am learning fastai, I assume it would be easier to have all the customization done in fastai environment.

My notebook can be found at

1 Like

Is it possible to do multi-label text classification with this project?

Hi, I am planning to start a project on this with text generation. I am just asking that is it easy to generate text with many pre-trained model like GPT-2 or wiki-text?
Or do we have an example notebook that is a text generation of it?

Hey @JonathanSum, @hello34 , @rubensmau I’m looking at fasthugs over the next few days, will hopefully be able to comment back here in a day or two :slight_smile:


How to deploy this model ? I think that since you have used transformers models, have you tried to deploy it ?

The same way you would deploy any other fastai model :slight_smile: I would say learn.export() would be better as I think that would export the tokenizer for you too.

I have already tried to export the model,
I have used a custom transformers model to get logits like this :

class CustomTransformerModel(nn.Module):
    def __init__(self, transformer_model: PreTrainedModel):
        self.transformer = transformer_model
   def forward(self, input_ids):
       # Return only the logits from the transfomer
       logits = self.transformer(input_ids)[0]   
    return logits

and I haved defined the learner as follows :

loss_func = nn.BCEWithLogitsLoss()
custom_transformer_model = CustomTransformerModel(transformer_model = bert_model)
from fastai.callbacks import * learner = Learner( databunch, custom_transformer_model, loss_func=loss_func, )

but when loading the model i get this error :

You need to declare what your custom model is before calling it. IE in a .py file you import or in a cell so it can reference it.

1 Like