RWKV, the Generative LM that could help RNNs make a comeback!

Sorry for my very late response! RWKV is inspired by Apple’s AFT ([2105.14103] An Attention Free Transformer), but they are adding lots of tricks on top of it. They don’t have a published paper as far as I know, but they are detailing everything in their repo:
GitHub - BlinkDL/RWKV-LM: RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Right now, they are training on the Pile, and after that, they are planning on training it with RLHF using the dataset that is currently being collected by Yannick Kilcher and the Open Assistant team.

If the performance of this model is on par with something like Flan-T5 or similar models, then we are talking about the “Stable Diffusion” for text generation, something that can be run in consumer GPUs with acceptable performance and good quality.

2 Likes