In the book in chapter 12 it says
Another architecture that is very powerful, especially in “sequence-to-sequence” problems (that is, problems where the dependent variable is itself a variable-length sequence, such as language translation), is the Transformers architecture. You can find it in a bonus chapter on the book’s website.
This refers to book.fast.ai which just sends you to course.fast.ai so no optional material.
Interestingly these are covered in Rachel’s NLP course. So I assume that replaces the optional material.