Loss function in recsys from lesson 4, and in fastai code in general

Hello, I have been trying to read through the fastai code in order to better understand how the loss function is being calculated in lesson 4.

Particularly, in the two lines

learn = collab_learner(data, n_factors=20, y_range=y_range)
learn.fit_one_cycle(10, 5e-3)

I was unsure where the loss function is (implicitly?) being defined. It would make more sense that the loss is part of the fitting function, so tht we can change it without changing the ‘learner’, although perhaps this is a matter of taste.

The last line of code in the fit_one_cycle(...) source code calls learn.fit(...). The fit() function in basic_train.py finishes by calling fit() from learner.py… OK. This in turn finishes by calling fit_gen(), and I don’t see any loss function defined there either.

Anyone want to help me walk through this to better understand what’s going on? I know Jeremy mentioned that it’s just a MSE loss being used in this case, but I don’t see where that’s happening.

Hi @Rosst

Remember that collab_learner defines the structure or architecture of your model. The fit_one_cycle is only for training the architecture you have defined in the collab_learner. To answer your question: the loss function is an essential part of your model architecture and should therefore be defined in the collab_learner and not in the training process.

From a more holisitic viewpoint, remember that the loss function is the main objective of the model. The loss function tells us about the performance of our model. With this in mind, why would you like to be able to change the loss function which is the central part your model during the training process?

I had these same questions! If memory serves, the loss function is chosen automatically when you create the Learner, on the basis of the DataBunch. You can see the chosen loss function afterwards at learn.loss_func. And you can specify it when the Learner is created, or change it by simple assignment afterwards. The same automatic process applies to the final activation function too.

I think the best way to understand such fastai internals is to trace with a source code debugger, watching variables, deducing from function names, taking notes, and working at it. You won’t find any design overview or “why” explanations in the source code. I admit to having frustration with this style. Once you go past the initial stage of simply trusting fastai, it makes it difficult to confirm that fastai is making the right choices for more complex problems. On the other hand, you will learn a tremendous amount about training loops, model architectures, and ML/Python best practices by tracing fastai code.

I use PyCharm, and hear that VSCode is a good choice too. The “Find usages” function is also very helpful for finding where specific fields are ultimately set. HTH with getting oriented, M

Aha, I see the relevant lines of code now:

In basic_train.py, the Learner class does this in __post_init__():

self.loss_func = self.loss_func or self.data.loss_func

In the collab_learner case, there is no self.loss_func defined, so it reverts to the self.data.loss_func'.

Ok… Now in basic_data.py, the DataBunch class has an @property function loss_func() which does this:

return getattr(self.train_ds.y, 'loss_func', F.nll_loss) if hasattr(self.train_ds, 'y') else F.nll_loss

which basically says: if the data has a y then use self.train_ds.y.loss_func.

Ok… train_ds is a DataSet object… and now I’m getting a bit lost, but getting closer!