Custom heads, loss functions and datasets/loaders for multi-output networks in fastai

In Lesson 8/9 we are building a network with multiple outputs, and this is the first time I’ve explored something like this before. The method used in pascal makes sense but I haven’t seen many examples of this and I want to understand if this is a standard pattern, in part because I noticed that we don’t use the same method on the input side (for example in ColumnarDataset).

It seems like the design pattern for __getitem__ in the dataset is:
[input1, input2, …, (output1, output2, …)]

Where everything before the last element is passed to forward call, and that the last element is passed to the loss function as the target.

The reason I ask is that I’m building an autoencoder with multiple input types, and so there are also multiple outputs. I’m happy to do the unrolling of the last element in the loss function, I just want to check that my assumptions about what is passing where are correct. i.e. that everything from __getitem__ except the last element is passed to forward, so it needs to have as many inputs as there are in the dataset, and that the last element is passed to the loss function and if it has multiple elements it has to be unrolled.

In the case of my autoencoder that looks like:
[input1, input2, (input1,input2)]

The other aspect I’m curious about is if this is a fastai specific decision or if this is common across pytorch in general.

Different libs have different ways of handling this. I’m hoping that fastai’s is more flexible and useful than others, but I’m still iterating a bit and feedback is most welcome.

Regarding your question, try it out and tell us what you find! You can pop a breakpoint in your loss function and observe what’s passed in.

Just finished doing that with a big smile on my face when I figured it out and came to check the forums. Man I am in love with the python debugger. It gets a little dicey with Cuda errors, but once those are out of the way it’s so amazing to be able to poke around the guts and make sure that the tensors are the right shape, etc.

In terms of the behaviour, that’s exactly what I saw. My target is a list of my continuous and categoricals, each of which I can index independently. I haven’t checked to see if it generalizes further, but at least for this case the two inputs map directly to the forward call and the outputs are a list that has to be unrolled in the loss function.

Thank you so much for this class! I feel like 1000 light bulbs are going off in my head at once right now working on this project. I’ve managed to get the model correctly instantiating and spitting out the expected number of outputs, and now I just need to write the custom loss function that takes all of the categoricals and computes the cross entropy on each. Can’t wait to get this working and then dive into the NLP work from Monday.

2 Likes

That’s awesome. Whilst it’s in your head, you may even want to (only if you feel like it) take the time to document how this part of fastai works in an asciidoc file and drop it in the doc/ directory in a PR! :slight_smile: (i.e. how do the x and y parts of a dataset end up looking in the mini-batch from the dataloader, and how does this get passed to forward and then how does the result of forward and the y values appear in the loss function)

BTW to avoid those cuda errors, do a .cpu() on the model and the tensors in your batch. Then it’s easier to debug.

3 Likes