Hello, I have been trying to read through the fastai code in order to better understand how the loss function is being calculated in lesson 4.
Particularly, in the two lines
learn = collab_learner(data, n_factors=20, y_range=y_range)
I was unsure where the loss function is (implicitly?) being defined. It would make more sense that the loss is part of the fitting function, so tht we can change it without changing the ‘learner’, although perhaps this is a matter of taste.
The last line of code in the
fit_one_cycle(...) source code calls
fit() function in
basic_train.py finishes by calling
learner.py… OK. This in turn finishes by calling
fit_gen(), and I don’t see any loss function defined there either.
Anyone want to help me walk through this to better understand what’s going on? I know Jeremy mentioned that it’s just a MSE loss being used in this case, but I don’t see where that’s happening.
collab_learner defines the structure or architecture of your model. The
fit_one_cycle is only for training the architecture you have defined in the
collab_learner. To answer your question: the loss function is an essential part of your model architecture and should therefore be defined in the
collab_learner and not in the training process.
From a more holisitic viewpoint, remember that the loss function is the main objective of the model. The loss function tells us about the performance of our model. With this in mind, why would you like to be able to change the loss function which is the central part your model during the training process?
I had these same questions! If memory serves, the loss function is chosen automatically when you create the Learner, on the basis of the DataBunch. You can see the chosen loss function afterwards at learn.loss_func. And you can specify it when the Learner is created, or change it by simple assignment afterwards. The same automatic process applies to the final activation function too.
I think the best way to understand such fastai internals is to trace with a source code debugger, watching variables, deducing from function names, taking notes, and working at it. You won’t find any design overview or “why” explanations in the source code. I admit to having frustration with this style. Once you go past the initial stage of simply trusting fastai, it makes it difficult to confirm that fastai is making the right choices for more complex problems. On the other hand, you will learn a tremendous amount about training loops, model architectures, and ML/Python best practices by tracing fastai code.
I use PyCharm, and hear that VSCode is a good choice too. The “Find usages” function is also very helpful for finding where specific fields are ultimately set. HTH with getting oriented, M
Aha, I see the relevant lines of code now:
Learner class does this in
self.loss_func = self.loss_func or self.data.loss_func
collab_learner case, there is no
self.loss_func defined, so it reverts to the
Ok… Now in
DataBunch class has an
loss_func() which does this:
return getattr(self.train_ds.y, 'loss_func', F.nll_loss) if hasattr(self.train_ds, 'y') else F.nll_loss
which basically says: if the data has a
y then use
train_ds is a
DataSet object… and now I’m getting a bit lost, but getting closer!