I’ve recently stumbled upon GENIE:
I’m currently in the process of reading this paper, but the idea behind it sounds fascinating.
which mixes diffusion of text with transformer blocks to generate paragraphs of text at a time. The experimental results only compare to a few models, only one of which I’m remotely familiar with - T5-Base - which I believe is a GPT-2 sized model.
It seems that the parallel text generation could have a number of benefits, including better GPU utilization, faster text generation, the ability to correct “earlier” areas of text that current decoder-only transformer models can’t handle.
Does anyone know if there are any large scale efforts to compare against GPT3.5/4 or Llama2?