Hello everyone!
I’m participating in a LAION project called Open-Assistant, led by Yannick Kilcher, to build an open version of ChatGPT.
The thing is that running these models is very resource-demanding; most of them can’t be run on a single GPU. But when deciding which models were going to be used for the Open-Assistant model, this project was presented RWKV-LM
It is an RNN trained as a Transformer, and the things that make this model so interesting is that in theory:
- It can be directly trained like a GPT (parallelizable)
- Fast training and fast inference
- Saves VRAM,
- “Infinite” ctx_len, and
- Free sentence embeddings
I’m still learning about RNNs and LSTMs to contribute and experiment with it, but since the results presented in the repo are truly impressive and here in this forum are lots of folks with a lot of experience working with them, I wanted to share it with this community, so it doesn’t go unnoticed. Probably we are in front of the future model that could power an LLM revolution like what happened with stable-diffusion that “commoditized” image generation, and since the model was accessible for everyone to run on their own devices (or in a colab environment, easily accessible to everyone) allowed the development of new amazing works and improvements.
It could be used in a new course, probably, haha, like an evolution of ULMfit with generative and embedding capabilities.
@jeremy @muellerzr @sgugger, sorry for mentioning all of you directly; but you are the first ones that I could think of that might see the potential (or the flaws and limitations) of this project.