Learning fastai part 2

xariusdrake · July 15, 2023, 6:24am

the last three days i learned: reimplemented 50% of the forward pass and backward pass of pipeline parallelism, reversed 20% of a balanced bracket classifier circuit, reversed 10% of the world model of OthelloGPT

xariusdrake · July 18, 2023, 9:07am

the last three days i learned: reimplemented [95% of the backward pass of pipeline parallelism, 100% of data transfer (supports backward pass, but not multi-node yet) in torchgpipe, and made some more progress on the multi-node notification mechanism in horovod], reversed [20% of the world model of OthelloGPT and 30% of a balanced bracket classifier circuit]

xariusdrake · July 22, 2023, 6:34am

the last four days i learned: reimplemented [100% of the forward pass of the pipeline (but not multi-node or 3D parallelism yet), made some little progress on ParallelContext in OSLO and FSDP], reversed [23% of the world model of OthelloGPT and 45% of the balanced bracket classifier circuit]

xariusdrake · July 25, 2023, 9:05am

the last three days i learned: reimplemented (100% of the backward pass of pipeline parallelism, 60% initializes parallel groups in 3D parallelism, 5% of CPU offload, and 60% partitioning of model states in FSDP) and learned some more about superposition

SCR-20230725-mdga1920×672 93 KB

xariusdrake · July 28, 2023, 9:14am

the last three days i learned: reimplemented (100% of sharding params in FSDP, 5% of rebuilding parameters in the forward and backward pass in FSDP), more on superposition, and transformer circuit

MaxMynter · July 29, 2023, 11:09pm

Props for sticking with this since November. Consider me impressed.

xariusdrake · July 29, 2023, 11:14pm

i’m not going to stop

xariusdrake · August 1, 2023, 8:58am

the last four days i learned: reimplemented (70% of communication primitives (support training) & 70% of initializing parallel groups in 3D parallelism, 100% ParallelMLP), reversed (80% of balanced bracket classifier circuit, 27% of the world model of OthelloGPT), 50% of why SoLU works

xariusdrake · August 4, 2023, 9:12am

the last three days i learned: 100% of initializing parallel groups in 3D parallelism, 80% of communication primitives (close to fully parallelizing transformer from scratch), 30% of training tensor parallelism with pipeline parallelism (starting with single node first), and 20% of MLP as key-value memories

xariusdrake · August 11, 2023, 8:42am

This is from 6 days ago; I forgot to post (yes, I’m still extremely consistent).

the last three days i failed : didn’t manage to make progress on ZeRO-offload and ZeRO-1 + a few other stuff (what next? try again.)

xariusdrake · August 11, 2023, 8:43am

This is from yesterday; I forgot to post (yes, I’m still extremely consistent).

the last three days i learned: reimplemented (15% of zero-1, zero-offload scheduler), reversed (30% of IOI circuit, 60% of the world model of OhelloGPT)

xariusdrake · August 14, 2023, 8:33am

the last three days i learned: reimplemented (20% of ZeRO-1, 15% turn any transformers model to 3D parallelism, 30% fully parallelize a transformer model), reversed (40% of IOI circuit), more on transformer circuit

xariusdrake · August 17, 2023, 9:05am

the last three days i learned: reimplemented (60% fully parallelize a transformer, 10% multi-node 3D parallelism (support training)), reversed (50% of IOI circuit, 1% of addition circuit)

xariusdrake · August 20, 2023, 6:01am

the last three days i learned: reimplemented (70% fully parallelize a transformer, 15% multi-node 3D parallelism), reversed 10% of modular addition circuit

xariusdrake · August 24, 2023, 8:53am

the last four days i learned: wrote 80% fully parallelized a transformer, 20% multi-node 3D parallelism, and reversed 60% of IOI circuit

xariusdrake · August 27, 2023, 8:44am

the last two days i learned: 88% of fully parallelizing a transformer and 23% of multi-node 3D parallelism + other stuff

xariusdrake · August 30, 2023, 8:51am

the last three days I learned: reversed 85% of balanced bracket classifier, wrote 25% of multi-node 3D parallelism, and 90% of fully parallelizing a transformer

xariusdrake · September 2, 2023, 5:53am

the last three days i learned: wrote 50% of turn any transformers model to 3D parallelism, reversed 87% of balanced bracket classifier circuit

xariusdrake · September 5, 2023, 8:49am

the last three days i learned: wrote 80% of turn any transformers model to tensor parallelism, more progress on IOI circuit

xariusdrake · September 8, 2023, 8:50am

the last three days i learned: wrote 100% of automatically parallelize any transformers model in tensor parallelism, 20% of multi-node pipeline parallelism, and reversed 90% of IOI circuit