Learning fastai part 2

the last three days I learned: 90% of automatic 3D parallelism for transformers, 65% of MoE that works in 3D parallelism (put sequence parallelism on hold) + …



(this is from yesterday, i forgot to post)

the last three days i learned: implemented 91.5% of automatic 3D parallelism for :hugging: transformers, 2% of end-to-end FP8 mixed precision training + …




the last three days i learned: didn’t manage to make much progress, but will try again

the last three days i learned: implemented 92.5% of automatic 3D parallelism for :hugs: transformers, 75% of MoE that works in 3D parallelism

the last three days i learned:






the last four days i learned: 82% of MoE that works in 3D parallelism (also didn’t make much progress, but will try again)

the last three days i learned: implemented 0.01% of DiLoCo decentralized pre-training







the last few days i learned: implemented 3% of DoReMi, 4% of end-to-end FP8 training in 3D Parallleism (exept FP8 kernels) + a lot of other stuff


(this is from 6 days ago, there are some notes, but i lazy to post :))

the last two days i learned: implemented 20% of DoReMi, 10% of end-to-end FP8 training in 3D parallelism (except FP8 kernels)

(this is from 3 days ago, there are some notes, but i lazy to post :))

the last three days i learned: implemented 13% of end-to-end FP8 training in 3D parallelism from scratch (except FP8 kernels) + other stuff

the last three days i learned: implemented 17% of end-to-end FP8 training in 3D parallelism (except FP8 kernels), 24% of DoReMi + other stuff




the last four days i learned: implemented 19% of end-to-end FP8 training in 3D parallelism (except FP8 kernels) + other stuff


the last four days i learned: didn’t manage to get much things done in the last four days, but will try again

the last three days i learned: wrote 90% of DoReMi and 30% of end-to-end FP8 training in 3D parallelism from scratch





i learned: 99% of doremi reproduction, 32% of end-to-end FP8 training in 3D parallelism from scratch (except fp8 kernels) + …