Hi fellow learners. I am going through Chapter 4 in the Fastbook and I got confused by the activation features. It is explicitly stated that the first layer constructs 30 activation features that are passed to the activation function and then to another layer. My question is, how are these 30 activation features computed? I understand that when we specify out_features
in torch.nn.Linear
as 1
, then y
is a scalar representing the model’s confidence in what label a particular instance belongs to. However, I do not understand how this is calculated when out_features
is greater than 0. I looked into the PyTorch codebase and found out that torch.nn.Linear
activates a different set of parameters for every out feature. Is it really the case that when I define 30 output features, the model computes the loss and accompanying gradient on 30 different sets of weights or maybe I am getting it all wrong?
If my understanding is true, then is it the case that these 30 different sets of weights get applied to the same batch from mini-batches per epoch or it is actually randomized?
https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear