Help with Vision + NLP Training Loop

Good day everyone!

I recently took upon the challenge of trying to build an image captioning model to familiarize myself with fastai. However, I recently hit a roadblock trying to recreate (with PyTorch) this training loop from the tensorflow docs.

I have my encoder, decoder and my DataLoader ready. How can I implement the Learner class with this modification? Any help would be appreciated!

Thank you for these resources! I did have a look at them before creating my initial post but I still found myself stuck. I guess what would be better for me at this point is to start off with something simple first and then come back to this problem. In case I do find something (or eventually figure it out myself), I’ll update this thread.

Thank you so much for your reply!