The last four days I learned: fixed my reward model in RLHF, how to train the reward model
“Hello! I have been enjoying some of your content on Twitch and I was curious if you could share your flashcards with me. I understand that you use Remnote, but is it possible to convert them to the Anki format?”
Hey. Thanks for watching !! What subject would you like to share? I can share them all if you’d like.
Sending all would be great. I’m particularly interested in your implementation of reinforcement learning and robotics papers.
Here you go. Anki format
https://drive.google.com/file/d/1OkVm4IavaJ7_270hkFRzbFa1nakBGZew/view?usp=sharing
Thank you. I requested for access.
Oh sorry. Just updated the link above
Check out my RLHF implementation at GitHub - xrsrke/instructGOOSE: Implementation of Reinforcement Learning from Human Feedback (RLHF). For the robot paper, it’s not currently my top priority, so you’ll see that I’m making slow progress on it