Hi all, I was able to get around 0.83 LB score.Below is approach
Precompute vgg16bn features for all(initial_train, initial_valid, additional_train, additional_valid, test) with image size 448x448 .Be sure to remove useless and corrupted images(check for them in forums)
Use this as input to simple (MaxPool-(Dense(4096)-Dropout(0.6)-Batchnorm)x2-Dense(3,softmax)). Train this for 3 epochs on the entire data.
Build 5 such models and average their predictions.
I tried data augmentation but it doesn't seem to converge as fast non-augmentation approach.(Maybe I need to experiment it more)