the last three days I learned: 90% of automatic 3D parallelism for transformers
, 65% of MoE that works in 3D parallelism (put sequence parallelism on hold) + …
(this is from yesterday, i forgot to post)
the last three days i learned: implemented 91.5% of automatic 3D parallelism for
transformers
, 2% of end-to-end FP8 mixed precision training + …
the last three days i learned: implemented 92.5% of automatic 3D parallelism for transformers, 75% of MoE that works in 3D parallelism
the last four days i learned: 82% of MoE that works in 3D parallelism (also didn’t make much progress, but will try again)
the last few days i learned: implemented 3% of DoReMi, 4% of end-to-end FP8 training in 3D Parallleism (exept FP8 kernels) + a lot of other stuff
(this is from 6 days ago, there are some notes, but i lazy to post :))
the last two days i learned: implemented 20% of DoReMi, 10% of end-to-end FP8 training in 3D parallelism (except FP8 kernels)
(this is from 3 days ago, there are some notes, but i lazy to post :))
the last three days i learned: implemented 13% of end-to-end FP8 training in 3D parallelism from scratch (except FP8 kernels) + other stuff
the last three days i learned: implemented 17% of end-to-end FP8 training in 3D parallelism (except FP8 kernels), 24% of DoReMi + other stuff
the last four days i learned: implemented 19% of end-to-end FP8 training in 3D parallelism (except FP8 kernels) + other stuff
the last four days i learned: didn’t manage to get much things done in the last four days, but will try again
the last three days i learned: wrote 90% of DoReMi and 30% of end-to-end FP8 training in 3D parallelism from scratch