the last three days i learned: reimplemented 50% of the forward pass and backward pass of pipeline parallelism, reversed 20% of a balanced bracket classifier circuit, reversed 10% of the world model of OthelloGPT
the last three days i learned: reimplemented [95% of the backward pass of pipeline parallelism, 100% of data transfer (supports backward pass, but not multi-node yet) in torchgpipe
, and made some more progress on the multi-node notification mechanism in horovod
], reversed [20% of the world model of OthelloGPT and 30% of a balanced bracket classifier circuit]
the last four days i learned: reimplemented [100% of the forward pass of the pipeline (but not multi-node or 3D parallelism yet), made some little progress on ParallelContext
in OSLO
and FSDP], reversed [23% of the world model of OthelloGPT and 45% of the balanced bracket classifier circuit]
the last three days i learned: reimplemented (100% of the backward pass of pipeline parallelism, 60% initializes parallel groups in 3D parallelism, 5% of CPU offload, and 60% partitioning of model states in FSDP) and learned some more about superposition
the last three days i learned: reimplemented (100% of sharding params in FSDP, 5% of rebuilding parameters in the forward and backward pass in FSDP), more on superposition, and transformer circuit
Props for sticking with this since November. Consider me impressed.
iām not going to stop
the last four days i learned: reimplemented (70% of communication primitives (support training) & 70% of initializing parallel groups in 3D parallelism, 100% ParallelMLP), reversed (80% of balanced bracket classifier circuit, 27% of the world model of OthelloGPT), 50% of why SoLU works
the last three days i learned: 100% of initializing parallel groups in 3D parallelism, 80% of communication primitives (close to fully parallelizing transformer from scratch), 30% of training tensor parallelism with pipeline parallelism (starting with single node first), and 20% of MLP as key-value memories
This is from 6 days ago; I forgot to post (yes, Iām still extremely consistent).
the last three days i failed : didnāt manage to make progress on ZeRO-offload and ZeRO-1 + a few other stuff (what next? try again.)
This is from yesterday; I forgot to post (yes, Iām still extremely consistent).
the last three days i learned: reimplemented (15% of zero-1, zero-offload scheduler), reversed (30% of IOI circuit, 60% of the world model of OhelloGPT)
the last three days i learned: reimplemented (20% of ZeRO-1, 15% turn any
transformers
model to 3D parallelism, 30% fully parallelize a transformer model), reversed (40% of IOI circuit), more on transformer circuit
the last three days i learned: reimplemented (60% fully parallelize a transformer, 10% multi-node 3D parallelism (support training)), reversed (50% of IOI circuit, 1% of addition circuit)
the last three days i learned: reimplemented (70% fully parallelize a transformer, 15% multi-node 3D parallelism), reversed 10% of modular addition circuit
the last four days i learned: wrote 80% fully parallelized a transformer, 20% multi-node 3D parallelism, and reversed 60% of IOI circuit
the last two days i learned: 88% of fully parallelizing a transformer and 23% of multi-node 3D parallelism + other stuff
the last three days I learned: reversed 85% of balanced bracket classifier, wrote 25% of multi-node 3D parallelism, and 90% of fully parallelizing a transformer
the last three days i learned: wrote 50% of turn any
transformers
model to 3D parallelism, reversed 87% of balanced bracket classifier circuit
the last three days i learned: wrote 80% of turn any
transformers
model to tensor parallelism, more progress on IOI circuit