Visualising the attention with the Transformer architecture

Hi all,

I’m experimenting with fine-tuning language models on a specific domain with further text classifications task, specifically with the Transformer architecture.

How to output a nice visualisation of the attention with Transformer, similar to:

In docs it said: Provides an interpretation of classification based on input sensitivity. This was designed for AWD-LSTM only for the moment, because Transformer already has its own attentional model.



I feel this needs a bit more clarity from the documentation end.

P.S. I am stuck with the same thing. Will be great to have some help.

@klein Were you able to figure it out ?

Nope, I’m still learning and hope to realised how to do it. If you come up with a good solutions, please do share and tag me.