Text classifier overfitting?

I want to train a text classifier for some quite specialized text. Training set size is ~2.9mln, while the test set size is 150k.

Previously, I had fine tuned a language model on my whole data set (~5mln data points). Accuracy on next word prediction after a 95/5 random split is around 50% (so, quite high).

Now, after just one epoch of training of the freezed text classifier, I got an accuracy with threshold 0.5 of 99.68%.
Training loss is 0.0154, test loss is 0.017.

Performance seems too high to me, so I’m worried that I might have overfitted the training set in some way.

How can I try to find out? Should I train the language model on fewer pieces of text or even exclude the ones that I plan on classifying?

Cheers to this great community!

Previously