Using Diffusion Models with Transformers to Generate Text?

I’ve recently stumbled upon GENIE:

I’m currently in the process of reading this paper, but the idea behind it sounds fascinating.

which mixes diffusion of text with transformer blocks to generate paragraphs of text at a time. The experimental results only compare to a few models, only one of which I’m remotely familiar with - T5-Base - which I believe is a GPT-2 sized model.

It seems that the parallel text generation could have a number of benefits, including better GPU utilization, faster text generation, the ability to correct “earlier” areas of text that current decoder-only transformer models can’t handle.

Does anyone know if there are any large scale efforts to compare against GPT3.5/4 or Llama2?


I’m not aware of any specific large-scale efforts comparing GENIE to GPT-3.5/4 or Llama2, but it would be interesting to see such comparisons in the future.

Thanks for the reply. I’m researching it. If I find anything interesting I’ll update this.