NLP classification model using fastai and hf models

Hi all,
I’ve been taking the fastai courses and following the movie review classification example to build my own NLP classification model. Currently, I’m primarily using fastai functions and AWD_LSTM as the underlying architecture to first train my language model. I’m curious if anyone has used other models, such as GPT or LLaMA, instead of AWD_LSTM.

I’ve done some research, and it seems I would need to use the Hugging Face packages. However, I love the conciseness of the fastai package and would like to use fastai functions as much as possible. Any suggestions on how to integrate fastai with Hugging Face models?

Thank you!

Hugging Faces have a transformer model. I found an example on the Internet for multi-attribute classification so rather than one class multiple classes so NLP 1 sentence = Happy (0.9) Male (0.95). So NLP1 = happy male. I was interested in multi classifying social media posts to identify common or new complaints.
Regards Conwyn

Hello,

AutoTokenizer and AutoModelForSequenceClassification: These are the core of the Hugging Face transformers library. AutoTokenizer automatically selects the correct tokenizer for your chosen model, and AutoModelForSequenceClassification provides a model head suitable for classification tasks. Crucially, you specify num_labels to match your classification problem.
Fastai TextDataLoaders: This is where you bring in the fastai magic. You use from_df (or other appropriate methods) to create your DataLoaders. The crucial part is that you pass the Hugging Face tokenizer to the TextDataLoaders. This ensures consistent tokenization between the model and your data.
Fastai Learner: You create the Learner as usual, passing in the Hugging Face model and your DataLoaders. Use CrossEntropyLossFlat() (or your appropriate loss function) and accuracy (or other metrics). SnapTik
Fine-tuning: learn.fine_tune() works seamlessly. Fastai handles the training loop, leveraging its powerful callbacks and training utilities.
Prediction: learn.predict() works as expected.
2. Key Considerations and Best Practices:

Model Selection: Choose a pre-trained model appropriate for your task and dataset size. Larger models (like LLaMA) require more resources but can offer better performance. Start with smaller models like BERT or DistilBERT for experimentation.
Tokenization: Consistent tokenization is absolutely essential. Using the Hugging Face tokenizer with fastai’s TextDataLoaders solves this.
Batch Size: Adjust the bs (batch size) parameter in TextDataLoaders based on your GPU memory. Large models may require smaller batch sizes.
Hyperparameter Tuning: Experiment with learning rates, weight decay, and other hyperparameters. Fastai’s lr_find() can be very helpful.

Best Regards

Thank you very much for the detailed explanation! I’ll try them out.