Learning fastai part 2

the last four days i learned:

  • reimplemented the backward pass of ColumnParallelLinear, forward pass of RowParallelLinear in Megatron-LM, VocabParallelEmbedding in GPT-NeoX, 1/3 visualize attention pattern

  • 3/10 how to implement schedule execution order and calculation dependencies in TorchPipe, and 2/5 how to launch a training pipeline on Kubernetes







the last three days i learned: 2/5 how to calculate head attribution and visualize attention patterns, 1/5 reimplemented GPUs allocation in Megatron-LM, 3.5/10 how to implement schedule execution order and backpropagation dependency in TorchGPipe, some basics of torch distributed RPC








the last three days i learned: 4.0/10 how to implement schedule execution order and backpropagation dependency in TorchGPipe, 2/5 reimplemented GPUs allocation in Megatron-LM, how to orchestrate an ML flow, some basics of AWS and memory management, 2.5/5 how to calculate head attribution and visualize attention patterns














the last four days i learned: reimplemented activation patching, 3.5/5 GPU allocation in Megatron-LM (close :wink: , will share code in 2-3 days), 1/3 parameter partitioning in ZERO optimizer, 5.0/10 how to implement schedule execution order and backpropagation dependency in TorchGPipe, some basics of operating systems and AWS VPC













the last four days i learned: reimplemented 5/5 gpu allocation in Megatron-LM, automatically head detection based on target patterns, 0.5/5 reverse an IOI circuit in GPT-2, 1/5 IndexedCachedDataset and MMapIndexedDataset in fairseq , and scheduled an mlflow using aws step functions











the last three days i learned: reimplemented cached datasets, 1/2 discovering latent knowledge using CCS, 1/5 input swap graph: discovering the role of neural network components, 2/5 of ParallelMLP in Megatron-LM, and learned some basics of batch processing using Apache Spark, and operating system







the last three days i learned: reimplemented (3/5 parameter partitioning and .step() in Zero Optimizer), learned how to (move data from a data lake to a data catalog, and some basics of batch and stream processing using Apache Spark and Kafka)





the last two days i learned: how to version data and gpu memory hierarchy, built (1/3 a monitoring distribution drift and logging service for model inference, 1/3 a data warehouse for training data using BigQuery)