Fine Tuning Pretrained Embeddings

I’m working on the Kaggle Quora competition and have had some good success with Word2Vec embeddings. There are about 60k word tokens in the quora vocab that don’t exist in the Word2Vec vocab, and until now, I’ve been setting these to zero and making the embedding layer non-trainable.

I’m experimenting with fine tuning the embedding layer by setting these 60k words to a random normal distribution and then allowing backprop to update the embeddings. This is “working” but I’m finding that the model overfits within 1 or 2 epochs when it used to take 14-15 epochs to overfit and doesn’t achieve similar validation loss results. I’d like to apply some regularization to my embeddings, but how? When I tried to apply dropout to the embedding I get a warning from Keras:

The `dropout` argument is no longer support in `Embedding`. You can apply a `keras.layers.SpatialDropout1D` layer right after the `Embedding` layer to g
et the same behavior.

What is this SpatialDropout1D layer, and how does it work? Keras docs also suggest I can use some embeddings_regularizer parameter, but this is new to me. What type of regularization would help avoid over fitting while I fine tune the Word2Vec embeddings? Any tips would be appreciated!

1 Like

In the course we add l2 regularization to learnt embeddings - that’s a nice easy way to handle it. It’s often a good idea to train a few epochs with the embeddings fixed, before making trainable for the last few epochs, BTW.

1 Like