In EmbeddingNet from lesson 5 we don’t use dropout after lin2, but most answers on StackOverflow and many papers suggest we should use dropout after every fully connected.
def forward(self, cats, conts): users,movies = cats[:,0],cats[:,1] x = self.drop1(torch.cat([self.u(users),self.m(movies)], dim=1)) x = self.drop2(F.relu(self.lin1(x))) return F.sigmoid(self.lin2(x)) * (max_rating-min_rating+1) + min_rating-0.5
However, my experiments show that adding dropout before final layers make network overall worse.
Which strategy is right and why?
I understood that lin2 is not the layer before output, it is the output. That’s why we don’t need dropout after it.