Consider the word ‘cat’. Let’s assume that I have a pretrained language model encoder. I want a ‘representation’ for this word ‘cat’.
- One way is, I extract the embedding corresponding to this word from the embedding layer
- Another way is, I extract the output of the encoder. And probably do
concat pooling
on top, i.e. return theinput
to the decoder.
Does anyone have any thoughts on which representation
should be better - theoretically?