the last two days i learned: wrote 100% of data parallelism, 28% of multi-node pipeline parallelism
the last three days i learned: wrote 43% of multi-node pipeline parallelism, some basics of MoE, and more about IOI circuit
(this is from 6 days ago)
the last three days i learned: 2% of making MoE work in 3D parallelism, and 72% of multi-node pipeline parallelism
the last few days i learned: 78% of multi-node pipeline parallelism, 30% of zero-1 (yes, i’ve been stuck that hard)
[this is from 13 days ago]
the last week i learned: 90% of multi-node pipeline parallelism, and a lot of other stuff
the last three days i learned: implemented 100% of multi-node pipeline parallelism (but there is a catch) + …
the last three days i learned: implemented 2% of sequence parallelism, 1.5% of MoE that works in 3D parallelism, but am still stuck on making ZeRO-1 work with hybrid parallelism
the last three days i learned: fixed a convergence bug in zero-1 (for real this time), 2% of MoE that works in 4D parallelism
(this is from 9 days ago)
the last three days i learned: didn’t manage to make much progress, but will try again