I entered a kaggle competition where I’m using RNNs for sequence classification. The first step in my models take a sentence, converts each word to in integer index value and runs the it through an Embedding layer. The models I create with a trainable, randomly initialized, 75 dimension Embedding layer achieve a lower validation loss then the models I train with 300 dimension Embedding layer initialized with GloVe vector weights. I initially don’t allow training on the GloVe initialized Embeddings layer. When my validation loss plateaus I make this layer trainable. Still, my GloVe models do not perform as my lower capacity models.
Two questions:
-
Is it common for models with an Embedding layer with randomly initialized weights to beat models with an Embedding layer initialized with GloVe word vectors?
-
Does it ever make sense to use PCA to reduce the dimensionality of a GloVe word vectors and use these vectors to initialize the Embedding Layer of your model? Has anyone had success with this approach?