Learning fastai part 2

xariusdrake · November 22, 2023, 9:22am

the last three days I learned: 90% of automatic 3D parallelism for transformers, 65% of MoE that works in 3D parallelism (put sequence parallelism on hold) + …

xariusdrake · November 26, 2023, 8:44am

(this is from yesterday, i forgot to post)

the last three days i learned: implemented 91.5% of automatic 3D parallelism for transformers, 2% of end-to-end FP8 mixed precision training + …

xariusdrake · November 28, 2023, 9:16am

the last three days i learned: didn’t manage to make much progress, but will try again

xariusdrake · December 1, 2023, 9:03am

the last three days i learned: implemented 92.5% of automatic 3D parallelism for transformers, 75% of MoE that works in 3D parallelism

xariusdrake · December 4, 2023, 9:31am

the last three days i learned:

xariusdrake · December 8, 2023, 9:29am

the last four days i learned: 82% of MoE that works in 3D parallelism (also didn’t make much progress, but will try again)

xariusdrake · December 11, 2023, 9:25am

the last three days i learned: implemented 0.01% of DiLoCo decentralized pre-training

xariusdrake · December 22, 2023, 3:28pm

the last few days i learned: implemented 3% of DoReMi, 4% of end-to-end FP8 training in 3D Parallleism (exept FP8 kernels) + a lot of other stuff

xariusdrake · December 30, 2023, 3:06pm

(this is from 6 days ago, there are some notes, but i lazy to post :))

the last two days i learned: implemented 20% of DoReMi, 10% of end-to-end FP8 training in 3D parallelism (except FP8 kernels)

xariusdrake · December 30, 2023, 3:06pm

(this is from 3 days ago, there are some notes, but i lazy to post :))

the last three days i learned: implemented 13% of end-to-end FP8 training in 3D parallelism from scratch (except FP8 kernels) + other stuff

xariusdrake · December 30, 2023, 3:09pm

the last three days i learned: implemented 17% of end-to-end FP8 training in 3D parallelism (except FP8 kernels), 24% of DoReMi + other stuff

xariusdrake · January 3, 2024, 3:11pm

the last four days i learned: implemented 19% of end-to-end FP8 training in 3D parallelism (except FP8 kernels) + other stuff

xariusdrake · January 7, 2024, 2:50pm

the last four days i learned: didn’t manage to get much things done in the last four days, but will try again

xariusdrake · January 21, 2024, 1:27pm

the last three days i learned: wrote 90% of DoReMi and 30% of end-to-end FP8 training in 3D parallelism from scratch

xariusdrake · February 13, 2024, 2:45pm

i learned: 99% of doremi reproduction, 32% of end-to-end FP8 training in 3D parallelism from scratch (except fp8 kernels) + …