Should I use dropout before output layer?[Solved]

In EmbeddingNet from lesson 5 we don’t use dropout after lin2, but most answers on StackOverflow and many papers suggest we should use dropout after every fully connected.

def forward(self, cats, conts):
        users,movies = cats[:,0],cats[:,1]
        x = self.drop1([self.u(users),self.m(movies)], dim=1))
        x = self.drop2(F.relu(self.lin1(x)))
        return F.sigmoid(self.lin2(x)) * (max_rating-min_rating+1) + min_rating-0.5

However, my experiments show that adding dropout before final layers make network overall worse.
Which strategy is right and why?

I understood that lin2 is not the layer before output, it is the output. That’s why we don’t need dropout after it.