Lesson 4: How to train a transformer on my own data?

JumpyJason · April 7, 2024, 3:11am

I just learned lesson 4 and want to train a simple transformer on my own. I actually want to use the NLP model to predict the next value of my sequence. All of the code is in a kaggle notebook: nlp try | Kaggle.
My problem is that no matter how hard I tweak the hyperparameters such as the batchsize and the method of cleaning the data, the model behaves badly. I don’t know much about transformers. Is it because I chose pearson correlation as my compute_metrics argument, or it is actually caused by my data?
All of the input sequences only contain integers from 1 to 8, and as to predict the sequence’s next value, I just tell the machine to classify the sequences to 8 categories. Is there any way to improve the NLP model’s performance?

ARelaxedScholar · April 29, 2024, 3:05pm

I am as much of a learner as everyone else, so take whatever I’ll say with a grain of salt, but if you mean train on your own data as in “from scratch”. Then it’s bound to not give great results because transformers require lots of data to train on.

If your goal is learning, then take that into account and expect lower metrics, if you actually want to use that model you’re probably better off finding a pre-trained model and fine-tuning that.

Just to be safe, I’d say wait for another answer than mine to either confirm/infirm that but yeah, that’d be my guess.

JumpyJason · April 30, 2024, 11:13am

Thanks, your comment made a point. After cleaning, my model only has about 800 items to train on. Maybe although all the sequences only contain eight kinds of “words”, the training set size is still hardly enough.

ARelaxedScholar · May 6, 2024, 6:04pm

Sorry for the delay, glad to be able to help. Yeah 800 items sound hardly enough. It’d be better starting with a pretrained BERT or DistillBERT if you want to go faster and try on that first.

bashaar · May 28, 2024, 10:04pm

Hi all,

Just working through lesson 4 and having a slight problem with colab. I’ve run out of computing units! What’s the recommended applications to use, sagemaker, Paying for colab or running on a server?

Thanks

alx42 · May 29, 2024, 3:56pm

You can run all the notebooks on kaggle for free

bashaar · May 29, 2024, 5:20pm

Thanks, But I always have problems when ‘pip downloading fastbook and fastai’ never recognises them when I try and import them. Don’t understand why every other notebook seems to do this without any hiccups?

JumpyJason · June 2, 2024, 1:25pm

Hi bashaar, if I am not mistaken, you are posting an irrelevant question under this “transformer” topic.
More people can see your question and you can get more answers if you choose to create a new topic like “ran out of computing units in colab”.
To your question, you can try kaggle. And pip downloading fastbook and fastai is not the correct way to download the packages, please try pip install fastbook instead - this command can get all packages up and ready.
Or can you be more specific on your question and create a new topic on that instead?

Daijobu · June 3, 2024, 9:04am

A good starting point to train a transformer from scratch is Andrej Karpathy’s video on his “microGPT”. You’ll still need a bit of compute but you could get some first (and beware, very modest) results with Kaggle’s 30hours of GPU.

Best,

Gautier

bashaar · June 4, 2024, 11:32am

Thanks, Yes will make sure to post to the relevant topic.

Sorry I have been using pip install, but still get the same errors. I have resolved this by running the kaggle API on jupyter notebooks. seems to be working fine