Twas the night before part 2, when all thro’ the forum, not a model was stirring except for this seq2seq humdrum

Maybe “humdrum” is not the right word, but at least it rhymes.

Anyways, giving the interest in seq2seq and it’s inclusion in part 2 of this course, I’m making the following notebook available to all in anticipation of things to come in part 2:

seq2seq with attention via the DataBlock API

The notebook illustrates the following:

  • How to use the DataBlock API to prepare your datasets for training
  • How to implement “attention” in a custom PyTorch model
  • How to examine the results of your seq2seq model
  • How to visualize “attention”

It builds heavily upon the 2018 part 2 course and includes a few other tricks I’ve learned along the way. For example, you’ll see the use of masks, PyTorch’s pad_packed_sequence/pack_padded_sequence functions, and a couple of ways to implement “teacher forcing”. You’ll also get a sense of how the DataBlock API can be used for any seq2seq task.

If you see something wrong or that can be improved, please submit a PR!

I’m also making my work available in hopes that folks out there on the forums can help make it better.

I’m sure there are all kinds of improvements that can be made and I wouldn’t be surprised if I’m doing a thing or two or ten wrong. And so … I’m gratefully accepting PRs to this repo and would welcome anyone that would be interested in adding other notebooks demonstrating other ways and best practices for tackling seq2seq problems using PyTorch and


I was looking at the custom callback that handles teacher forcing, and that fascinated (and confused) me for a bit.

So when I first saw ‘last_target’, I felt a little confused, upon further digging I found this (among other things).

Forgive me if this is a silly thing to say but the first thing that came to mind when I saw the phrase ‘last target that got through the model’ is that it was referring to the target vector from the previous batch, since that was the last target that went through the model. Am I wrong to think that? Teacher forcing using the target from the previous batch obviously doesn’t make much sense. Unless, the ‘last_target’ in this case actually refers to the target in the current batch right before it gets fed into learner.model?

last_target is the last target seen, so during that event, it’s the target of the current batch. Denominations might be a bit unclear.

Thanks for the clarification! In that case I really need to applaud the library for the convenience that callback brings!