the last four days i learned:
-
reimplemented the backward pass of ColumnParallelLinear
, forward pass of RowParallelLinear
in Megatron-LM, VocabParallelEmbedding
in GPT-NeoX, 1/3 visualize attention pattern
-
3/10 how to implement schedule execution order and calculation dependencies in TorchPipe, and 2/5 how to launch a training pipeline on Kubernetes
the last three days i learned: 2/5 how to calculate head attribution and visualize attention patterns, 1/5 reimplemented GPUs allocation in Megatron-LM, 3.5/10 how to implement schedule execution order and backpropagation dependency in TorchGPipe, some basics of torch distributed RPC
the last three days i learned: 4.0/10 how to implement schedule execution order and backpropagation dependency in TorchGPipe, 2/5 reimplemented GPUs allocation in Megatron-LM, how to orchestrate an ML flow, some basics of AWS and memory management, 2.5/5 how to calculate head attribution and visualize attention patterns
the last four days i learned: reimplemented activation patching, 3.5/5 GPU allocation in Megatron-LM (close
, will share code in 2-3 days), 1/3 parameter partitioning in ZERO optimizer, 5.0/10 how to implement schedule execution order and backpropagation dependency in TorchGPipe, some basics of operating systems and AWS VPC
the last four days i learned: reimplemented 5/5 gpu allocation in Megatron-LM, automatically head detection based on target patterns, 0.5/5 reverse an IOI circuit in GPT-2, 1/5 IndexedCachedDataset
and MMapIndexedDataset
in fairseq
, and scheduled an mlflow using aws step functions
the last three days i learned: reimplemented cached datasets, 1/2 discovering latent knowledge using CCS, 1/5 input swap graph: discovering the role of neural network components, 2/5 of ParallelMLP
in Megatron-LM, and learned some basics of batch processing using Apache Spark, and operating system
the last three days i learned: reimplemented (3/5 parameter partitioning and .step()
in Zero Optimizer), learned how to (move data from a data lake to a data catalog, and some basics of batch and stream processing using Apache Spark and Kafka)
the last two days i learned: how to version data and gpu memory hierarchy, built (1/3 a monitoring distribution drift and logging service for model inference, 1/3 a data warehouse for training data using BigQuery)