Is it possible to train a language model with 16bit?

I’m interested in training an AWD-LSTM language model with fp16 using the latest fastai library. Simply setting learner.to_fp16() doesn’t work for the language model (e.g. in the imdb notebook). It seems like the error comes from calling the torch layer functional.embedding with a tensor of the wrong type:

~/anaconda3/lib/python3.7/site-packages/fastai/text/models.py in forward(self, words, scale)
69 if scale: masked_embed.mul_(scale)
70 return F.embedding(words, masked_embed, self.pad_idx, self.emb.max_norm,
—> 71 self.emb.norm_type, self.emb.scale_grad_by_freq, self.emb.sparse)
72
73 #def _repackage_var(h:Tensors)->Tensors:

~/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1452 # remove once script supports set_grad_enabled
1453 no_grad_embedding_renorm(weight, input, max_norm, norm_type)
→ 1454 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1455
1456

RuntimeError: Expected tensor for argument #1 ‘indices’ to have scalar type Long; but got torch.cuda.HalfTensor instead (while checking arguments for embedding)

Before I spend too much time on this, has anyone else succeeded in getting it to work? Is there some fundamental reason why half precision wouldn’t be desirable with this model?

1 Like

Taking a guess here I think the issue is with the embedding matrix getting fp16 float values for indexing rather than integers.

torch embedding only admits long and int