LM Encoder | output vs embeddings

Consider the word ‘cat’. Let’s assume that I have a pretrained language model encoder. I want a ‘representation’ for this word ‘cat’.

  1. One way is, I extract the embedding corresponding to this word from the embedding layer
  2. Another way is, I extract the output of the encoder. And probably do concat pooling on top, i.e. return the input to the decoder.

Does anyone have any thoughts on which representation should be better - theoretically?