You used the “old” pytorch_pretrained_bert library instead of the new pytorch_transformers one. There is a breaking change, where model outputs are now tuples.
They give the instructions to just do the following:
# Let's load our model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
# If you used to have this line in pytorch-pretrained-bert:
loss = model(input_ids, labels=labels)
# Now just use this line in pytorch-transformers to extract the loss from the output tuple:
outputs = model(input_ids, labels=labels)
loss = outputs
How would you integrate this change with your existing notebook? Would one have to write a custom basic_train.py to achieve this? I can’t figure it out.
Also in addition, from my experiments, I’ve been finding that unfreezing and training the entire model seems to have equal, if not better performance than training the head first then gradually unfreezing the model. Often times it seems like it saves a lot more time to just train the unfrozen model
It’s weird but it seems that sometimes it gives better results.
I didn’t take time to check if the tools given by fastai like Discriminative Learning Rate, Gradual Unfreezing or even Slanted Triangular Learning Rates return better results with the transformer architectures. So it’s good to experiment with these parameters!
I used Gradual Unfreezing to let the possibility to people to use it. Maybe Gradual Unfreezing gives better performances with other model types or other datasets…
Thank you very much for all your remarks. If you have other questions don’t hesitate!
Thanks for your works and article. Really helpful.
I am actually participating in a Kaggle competition (Google QUEST) wherein I would like to use Transformers integration with Fastai. The problem is its an “internet off” competition. Any idea how to use your Kaggle Kernel with internet off?