Oddly enough, I tried training a convnet on the dataset, but failed to outperform my simple two-layer non-conv neural net. I learned in the process that dropout is better for the dense layers rather than the convolutional layers, and that it is better to use dropout in the later convlayers, not in the early ones, as the features in the first layers will be more generic. I'm now trying with VGG16bn, but even after a learning schedule involving over 20 iterations, and ensembling 4 Vggs, I cannot beat it. In fact, my score is substantially higher! I'm getting around 1.7 with the ensemble. The individual components get a validation accuracy of 0.916.
What could I try next? Trying with data augmentation, I get accuracies around 0.8 and validation accuracies as described above.
I thought that perhaps, given that there isn't that much data in each individual category, data augmentation isn't getting good batches for me, so to speak. So I disabled data augmentation and trained an ensemble of 2 models with the same schedule as above. Doing this, my models overfit, accuracies rise to 0.98, and validation accuracies raise to 0.95. I don't have more submissions remaining today, so I'll see what happens tomorrow when I submit this attempt.