Sequence models overfitting. Next steps?

I have been creating models for a kaggle competition and keep seeing the same pattern with overfitting.

After one or two epochs the validation test set accuracy is around 20-30% and the validation set accuracy hovers around 50%

After several more epochs the validation set and test set accuracy will have roughly 60% accuracy.

If I continue training validation accuracy plateaus slightly above 60% and test set accuracy will continue to near 100%.

This seems to happen independent of the amount of dropout I’ve added to the model. More dropout has lead to longer training times but has not prevented overfitting.

My model architecture has varied somewhat. I’ve tried relatively simple models so far including a MLP, a model with embeddings + a Conv1D and Dense layers, a multi-size Conv1D model and an LSTM based model. I’ve ramped up the dropout on all of them but have yet to try an L2 weight penalty.

As this is a kaggle competition, I can’t get more data. Pseudo labeling and data augmentation aren’t right for this competition. I’ve tried batch normalization on all models. In some cases it appears to help slightly, in others the model goes no where (i.e. seems to make predictions on the test set that are slightly better than random).

Should I keep experimenting with ways to fix overfitting in my current set of models hoping that my validation accuracy will get better or is my time better spent looking for new model architectures or interesting external external data to incorporate into the model?

Any high level advice would be much appreciated.