Updating part of an embedding matrix (only for out of vocab words)

Hello all,

TLDR : I would like to update only some rows of an embedding matrix for words that are out of vocab and keep the pre-trained embeddings frozen for the rows/words that have pre-trained embeddings.

It seems sensible (as is common practice) to initialise out of vocab word vectors to something other than random but it seems natural to try to improve on this by then training only the rows of the embedding matrix for out of vocab words (if it’s too expensive to train the whole embedding matrix). How would this work in PyTorch? I’m not aware that it’s been done before.

i.e. do something like:

self.embedding.weight.requires_grad = False
self.embedding.weight[mask, :].requires_grad = True

where mask is a boolean indicating if a word is out of vocab or not.

For reference, I’ve also asked this question here.

Many thanks in advance and happy new year!


You could do LearnerCallBack, on_backward_end, multiply the gradient with the mask so gradient is zero for old words. Then optimizer.step() will only adjust out-of-vocab words.

It’s too expensive to do this as we are still effectively calculating the gradient for all the words.