Lesson 4: How to train a transformer on my own data?

I just learned lesson 4 and want to train a simple transformer on my own. I actually want to use the NLP model to predict the next value of my sequence. All of the code is in a kaggle notebook: nlp try | Kaggle.
My problem is that no matter how hard I tweak the hyperparameters such as the batchsize and the method of cleaning the data, the model behaves badly. I don’t know much about transformers. Is it because I chose pearson correlation as my compute_metrics argument, or it is actually caused by my data?
All of the input sequences only contain integers from 1 to 8, and as to predict the sequence’s next value, I just tell the machine to classify the sequences to 8 categories. Is there any way to improve the NLP model’s performance?

I am as much of a learner as everyone else, so take whatever I’ll say with a grain of salt, but if you mean train on your own data as in “from scratch”. Then it’s bound to not give great results because transformers require lots of data to train on.

If your goal is learning, then take that into account and expect lower metrics, if you actually want to use that model you’re probably better off finding a pre-trained model and fine-tuning that.

Just to be safe, I’d say wait for another answer than mine to either confirm/infirm that but yeah, that’d be my guess.

Thanks, your comment made a point. After cleaning, my model only has about 800 items to train on. Maybe although all the sequences only contain eight kinds of “words”, the training set size is still hardly enough.