Learning fastai part 2

xariusdrake · March 2, 2023, 9:46am

the last two days i learned: implemented sampling candidate APIs in ToolFormer and read 2/5 the GPT-3 paper

xariusdrake · March 3, 2023, 8:50am

TIL: wrote 1/2 half the filtering API calls in ToolFormer, how to generate responses that align with human preferences without human labelling (Constitutional AI paper), how a language model can answer questions that contain images (MCTR paper), global gradient clipping

xariusdrake · March 6, 2023, 9:03am

the last three days i learned: implemented 90% of the ToolFormer paper (next refactor the code, add support for custom APIs, and benchmark it), how to evaluate language model’s behavior, assess dataset quality, and red team LMs

xariusdrake · March 8, 2023, 9:00am

the last two days i learned: wrote 1/5 support batch and execute API calls in parallel for the ToolFormer paper, read 1/7 the superposition of artificial neurons, context distillation in AI alignment, and some basics of JAX

xariusdrake · March 14, 2023, 9:52am

the last six days i learned: 1/5 Dreamerv3, 1/5 editing memory in language models, sandwiching experiments in oversight models

xariusdrake · March 15, 2023, 9:20am

TIL: implemented 10/10 ToolFormer, read 1.5/5 DreamerV3, 1/5 Flamingo

xariusdrake · March 18, 2023, 6:32am

the last three days i learned: add inference to ToolFormer (the last time i forgot it), what cause catastrophic forgeting in ANN, quantitatively evaluate transfer learning, basic of GNN, langchain

xariusdrake · March 20, 2023, 8:51am

the last two days i learned: implemented 70% Prioritized Level Replay (PLR) paper, some basics of pfrl and langchain lib

xariusdrake · March 25, 2023, 9:35am

the last 5 days i learned: more on REPAIED paper, ray framework

xariusdrake · March 28, 2023, 8:35am

the last three days i learned: how to quantitatively measure semantic similarity of different goal-conditioning embeddings, how GATO works, the world model in DreamerV3, how to train RL agents using only video, and about open-ended task systems in XLAND, hyperparamer tuning using ray, + torch_geometric

xariusdrake · April 2, 2023, 9:08am

the last five days i learned: reimplemented 0.5/10 Toy Models of Superposition and 1/10 flashattention, how dreamerv3 represents the latent state of an observation + ray framework

xariusdrake · April 8, 2023, 6:33am

the last few days i learned: how to write custom autograd functions, do activation patching, 1/4 model parallelism, reimplemented 3/10 FlashAttention, and some basics of JAX

xariusdrake · April 10, 2023, 8:37am

the last two days i learned: how to calculate induction score, split an image into patches in vision transformer, and some basics of jax

xariusdrake · April 12, 2023, 8:13am

the last two days i learned: how to store intermediate activations, create GCP resources using terraform, write parallel training scripts using accelerate, and execute tasks in a flow in parallel using prefect and metaflow

xariusdrake · April 16, 2023, 5:03am

the last four days i learned: 1/5 direct logit attribution, logit lens, how to run a task on AWS batch, built 1/10 data pipeline, and training pipeline

xariusdrake · April 18, 2023, 8:35am

the last two days i learned: 15% superposition, 20% on how and why model parallelism, pipeline parallelism, deepspeed, mix-precision training work, wrote code to upload data to a data lake (will add pipeline), vpc in aws, and some basics of jax

xariusdrake · April 21, 2023, 9:25am

the last three days i learned: how superposition relates to adversarial attacks (just learned the surface), some basics of model parallelism and distributed programming (will dig deeper), how and why gradient accumulation, mixed-precision training works, and wrote code send training data to a data warehouse

xariusdrake · April 24, 2023, 9:37am

the last three days i learned: how to visualize features using optimization + identify which part of an input activates a neuron (yes, will dig more), wrote the forward pass of data parallelism, and how tensor parallelism works (will put them all together)

xariusdrake · April 28, 2023, 5:38am

the last four days i learned: 1/2 of how to compute interference and polysemanticity of features, wrote code to split a model into partitions that utilize all GPUs, and reimplemented 2/5 of parallel MLPs in Megatron-LM, how to use CUDA stream

xariusdrake · May 1, 2023, 8:42am

the last three days i learned: 2/2 how to calculate interference and polysemanticity of latent features, reimplemented 2/2 the forward and 1/2 backward pass of ColumnParallelLinear in Megatron-LM, 2/10 how to implement schedule execution order and calculation dependencies in TorchPipe, and 1/5 how to launch a training pipeline on Kubernetes