In EmbeddingNet from lesson 5 we don’t use dropout after lin2, but most answers on StackOverflow and many papers suggest we should use dropout after every fully connected.
def forward(self, cats, conts):
users,movies = cats[:,0],cats[:,1]
x = self.drop1(torch.cat([self.u(users),self.m(movies)], dim=1))
x = self.drop2(F.relu(self.lin1(x)))
return F.sigmoid(self.lin2(x)) * (max_rating-min_rating+1) + min_rating-0.5
However, my experiments show that adding dropout before final layers make network overall worse.
Which strategy is right and why?
[EDIT]
I understood that lin2 is not the layer before output, it is the output. That’s why we don’t need dropout after it.