the last two days i learned: implemented sampling candidate APIs in ToolFormer and read 2/5 the GPT-3 paper
TIL: wrote 1/2 half the filtering API calls in ToolFormer, how to generate responses that align with human preferences without human labelling (Constitutional AI paper), how a language model can answer questions that contain images (MCTR paper), global gradient clipping
the last three days i learned: implemented 90% of the ToolFormer paper (next refactor the code, add support for custom APIs, and benchmark it), how to evaluate language model’s behavior, assess dataset quality, and red team LMs
the last two days i learned: wrote 1/5 support batch and execute API calls in parallel for the ToolFormer paper, read 1/7 the superposition of artificial neurons, context distillation in AI alignment, and some basics of JAX
the last six days i learned: 1/5 Dreamerv3, 1/5 editing memory in language models, sandwiching experiments in oversight models
the last three days i learned: add inference to ToolFormer (the last time i forgot it), what cause catastrophic forgeting in ANN, quantitatively evaluate transfer learning, basic of GNN, langchain
the last two days i learned: implemented 70% Prioritized Level Replay (PLR) paper, some basics of pfrl
and langchain
lib
the last three days i learned: how to quantitatively measure semantic similarity of different goal-conditioning embeddings, how GATO works, the world model in DreamerV3, how to train RL agents using only video, and about open-ended task systems in XLAND, hyperparamer tuning using ray, + torch_geometric