Learning fastai part 2

xariusdrake · May 5, 2023, 8:47am

the last four days i learned:

reimplemented the backward pass of ColumnParallelLinear, forward pass of RowParallelLinear in Megatron-LM, VocabParallelEmbedding in GPT-NeoX, 1/3 visualize attention pattern
3/10 how to implement schedule execution order and calculation dependencies in TorchPipe, and 2/5 how to launch a training pipeline on Kubernetes

xariusdrake · May 8, 2023, 8:49am

the last three days i learned: 2/5 how to calculate head attribution and visualize attention patterns, 1/5 reimplemented GPUs allocation in Megatron-LM, 3.5/10 how to implement schedule execution order and backpropagation dependency in TorchGPipe, some basics of torch distributed RPC

xariusdrake · May 11, 2023, 8:59am

the last three days i learned: 4.0/10 how to implement schedule execution order and backpropagation dependency in TorchGPipe, 2/5 reimplemented GPUs allocation in Megatron-LM, how to orchestrate an ML flow, some basics of AWS and memory management, 2.5/5 how to calculate head attribution and visualize attention patterns

xariusdrake · May 15, 2023, 8:55am

the last four days i learned: reimplemented activation patching, 3.5/5 GPU allocation in Megatron-LM (close , will share code in 2-3 days), 1/3 parameter partitioning in ZERO optimizer, 5.0/10 how to implement schedule execution order and backpropagation dependency in TorchGPipe, some basics of operating systems and AWS VPC

xariusdrake · May 19, 2023, 8:45am

the last four days i learned: reimplemented 5/5 gpu allocation in Megatron-LM, automatically head detection based on target patterns, 0.5/5 reverse an IOI circuit in GPT-2, 1/5 IndexedCachedDataset and MMapIndexedDataset in fairseq , and scheduled an mlflow using aws step functions

xariusdrake · May 22, 2023, 8:32am

the last three days i learned: reimplemented cached datasets, 1/2 discovering latent knowledge using CCS, 1/5 input swap graph: discovering the role of neural network components, 2/5 of ParallelMLP in Megatron-LM, and learned some basics of batch processing using Apache Spark, and operating system

xariusdrake · May 25, 2023, 8:31am

the last three days i learned: reimplemented (3/5 parameter partitioning and .step() in Zero Optimizer), learned how to (move data from a data lake to a data catalog, and some basics of batch and stream processing using Apache Spark and Kafka)

xariusdrake · May 30, 2023, 8:39am

the last two days i learned: how to version data and gpu memory hierarchy, built (1/3 a monitoring distribution drift and logging service for model inference, 1/3 a data warehouse for training data using BigQuery)

xariusdrake · June 5, 2023, 8:34am

the last few days i learned: built (3/3 a monitoring distribution drift and logging service for model inference, 1/5 a raw data to data warehouse pipeline), some basics of (operating systems, and C++)

xariusdrake · June 8, 2023, 8:42am

the last three days i learned: reimplemented (isolate the effect of a computational path in a neural network using path patching, 2/5 synchronous data transfer between CUDA streams in GPUs in torchgpipe), built 2/5 a raw data to data warehouse pipeline, and learned some basics of (kubernetes, operating systems, and c++)

xariusdrake · June 11, 2023, 9:44am

the last three days i learned: reimplemented [(schedule execution order, 2/5 gradient checkpointing, 3.5/5 synchronous data transfer between CUDA streams in GPUs in torchgpipe), 1/5 auto circuit discovery in a neural network using ACDC, 2/5 steering a language model at run-time by adding activation vectors], and learned some basics of (c++, kubernetes, and operating systems)

xariusdrake · June 14, 2023, 9:05am

the last three days i learned: reimplemented 2.5/5 gradient checkpointing in torchgpipe, learned some basics of operating systems and C++ (currently got stuck at the circuit and data transfer, but hell no, i’m not gonna give up)

xariusdrake · June 17, 2023, 5:10am

the last three days i learned: reimplemented (spawn workers for executing tasks, enforce virtual dependency in backward graph in pipeline parallelism torchgpipe), learned some basics of operating systems (yep, still stuck on auto circuit discovery)

xariusdrake · June 21, 2023, 7:35am

the last four days i learned: reimplemented (3.5/5 gradient checkpointing in torchgpipe, 4/5 steering a language model at runtime by adding activation vectors, made a little progress on reversing an IOI circuit in GPT-2), learned some basics of (operating systems, and HPC)

xariusdrake · June 24, 2023, 8:58am

the last three days i learned: reimplemented [0.1/5 discovery of new nodes at runtime in elastic training and 0.1/5 fault tolerance in horovod (yep, from scratch, this is just a very first step), 4.5/5 logit attribution (fill my gaps)], some basics of (CUDA programming and torch rpc)

xariusdrake · June 27, 2023, 8:40am

the last three days i learned: reimplemented (2/5 of notification mechanism and 1.5/5 of fault-tolerance in elastic training horovod, 4.5/5 gradient checkpointing in torchgpipe, 5/5 in logit attribution), and learned some basics of torch rpc

xariusdrake · July 1, 2023, 5:30am

the last four days i learned: reimplemented (1.5/5 transfer data for skip connections in pipeline parallelism torchgpipe, 2.8/5 notification mechanism,and 2.5/5 discover new nodes at run-time in elastic training horovod, 5/5 logit lens, 0.5/5 reverse an induction circuit from transformer’s weights), some basics of CUDA programming

xariusdrake · July 5, 2023, 8:19am

the last four days i learned: reimplemented (2.5/5 sync state across workers, 2.5/5 elastic sampler in elastic training horovod, 1/5 reverse an induction circuit from transformer’s weights), some basics of ai alignment

xariusdrake · July 9, 2023, 7:14am

the last four days i learned: reimplemented (4/5 reverse an induction circuit from transformer’s weight, 1.5/5 on superposition in neural networks (dig deeper this time), 3/5 sync state across workers in elastic training horovod)

![SCR-20230709-luap|690x316](upload://qBnAO10M2sZ3yrZTYBeVqMUIMiQ.jpeg

xariusdrake · July 12, 2023, 8:28am

the last three days i learned: reimplemented [1.5/5 elastic driver (this one controls worker nodes, executes jobs, monitors and collects results), 3.5/5 of the notification mechanism, 5/5 restore a synchronous state from its last backup state (but not triggering sync across all workers yet) in horovod, 5/5 of an induction circuit, 0.5/5 of a modular arithmetic circuit, and 0.5/5 of a balanced bracket classifier from transformer weights]