Text generation with pointer/dynamic cache or pointer sentinel

Have others tried using pointer cache for text generation? I am interested to hear about your experiences.

From my understanding, pointer cache uses a bank of (hidden, target) pairs to adjust probabilities of the next token via dot products and softmax. However when generating text, we do not have targets. Is it appropriate to use the actual generated next token as the ground truth target here? From the papers I have been reading, pointer cache has been shown to reduce perplexity, but I haven’t seen an analysis of text generation quality with it.

Could anyone explain the difference between dynamic and pointer cache, or point me to relevant literature or docs?

Last question: have others tried using pointer sentinel with FastAI language models for text generation or next work prediction? I am looking for good resources akin to this one for pointer cache:

The point cache for language model blog is fantastic, but a bit out of date now. Are there any more recent repos compatible with the newer versions of FastAI outlining pointer cache?