I entered a kaggle competition where I’m using RNNs for sequence classification. The first step in my models take a sentence, converts each word to in integer index value and runs the it through an Embedding layer. The models I create with a trainable, randomly initialized, 75 dimension Embedding layer achieve a lower validation loss then the models I train with 300 dimension Embedding layer initialized with GloVe vector weights. I initially don’t allow training on the GloVe initialized Embeddings layer. When my validation loss plateaus I make this layer trainable. Still, my GloVe models do not perform as my lower capacity models.
Is it common for models with an Embedding layer with randomly initialized weights to beat models with an Embedding layer initialized with GloVe word vectors?
Does it ever make sense to use PCA to reduce the dimensionality of a GloVe word vectors and use these vectors to initialize the Embedding Layer of your model? Has anyone had success with this approach?