Lesson 6 - Official topic

I have just tested @muellerzr’s notebook. It works fine. Your error message seems to be coming from fastcore, so most likely your fastcore and/or fastai2 installation(s) are not up-to-date, or they don’t have compatible versions to each other. Try upgrading/updating the installations, and run the notebook again.

Yijin

I try to build a image regression model (PointBlock). If I apply aug_transforms the keypoints sometimes are outside of the actual image. Is there a way to avoid that or discard the augmented image if that happens?

IIRC you need to adjust the padding type and not use crop (this is what’s happening)

1 Like

Jeremy says at the end of chapter 5 in the part “Further research” that we should try to improve pet breeds model’s accuracy and search in the forum for other student’s solutions. I can’t find the solutions of other students or their accuracies.
Can someone help me please? :slight_smile:

Hi all,

In the lesson for pet_breeds I noticed this about Cross Entropy Loss which I’m struggling to understand.

We’re only picking the loss from the column containing the correct label. We don’t need to consider the other columns, because by the definition of softmax, they add up to 1 minus the activation corresponding to the correct label. Therefore, making the activation for the correct label as high as possible must mean we’re also decreasing the activations of the remaining columns.

If we are only choosing the column with the correct label, wouldn’t we be maximising the loss and not minimising it?

For example if we have targets

#6 targets
targ = tensor([0,2,2,1,4,3])

And the following softmax activations

Indexing into the softmax activations we have,

We picked column 0 for the first prediction which has the highest activation. Wouldn’t this maximise the loss?

I have a colab notebook for reference.

Bit confused! :sweat_smile:

Any help will be much appreciated!

Cheers,
Adi

Hi Adi!

I think if you stopped right there, yes, it would maximize the loss. But you can just define the loss as the negative values of sm_acts, and now you’re minimizing loss :smiley:

Also, when we go one step further and take the negative log of the softmaxed activations, a confident correct prediction gets a small loss:
-log(0.99) = 0.0101
and a wrong prediction (the correct class has a low probability) gets a high loss:
-log(0.01) = 4.6052

Remember, nll_loss does not calculate the log despite its name.

Cheers :slight_smile:
Hannes

5 Likes

Hey Hannes,

Thanks again mate :smiley:

I ended up punching -log(0.09) vs -log(0.9) in a calculator last night and it made sense.

I’m trying to get an intuition for this and it makes sense now. When we take the negative, it minimizes the loss and if we take the log of that, it helps make the function sensitive to small difference such as 0.99 and 0.999 (which is really a 10X improvement).

I get the step is actually log_softmax followed by nll_loss despite the name. :crazy_face:

Thanks for jumping in and validating!

Adi

I think that sums it up nicely :smiley:

Glad I could help. Your questions always make me dig into the material again, which is great!

Hey guys, I have recently been working my way through the course, and reached the chapter on collaborative filtering. I got kind of stuck here.

Jeremy gave an example of what we are trying to achieve by fitting the latent factors using an Excel sheet.

Here, Jeremy took a batch of 15: so 15 movies and 15 users, and 5 latent factors. Jeremy then calculated the predictions by taking the dot product, yielding 15x15 predictions. I am clear till here.

However, while defining the model from scratch:

class DotProduct(Module):
    def __init__(self, n_users, n_movies, n_factors):
        self.user_factors = Embedding(n_users, n_factors)
        self.movie_factors = Embedding(n_movies, n_factors)

    def forward(self, x):
        users = self.user_factors(x[:,0]) 
        movies = self.movie_factors(x[:,1]) 
        return (users * movies).sum(dim=1)

we are using (users * movies).sum(dim=1) that yields a shape of batch size. So for a batch of 15, it would yield a tensor of 15 predictions. Shouldn’t it be 15 x 15, a prediction for each combination of user and movie?

Thanks!

Hi,
I also didn’t find other students solutions, I’m sure there is an official topic somewhere and if someone can point to the link it will be very helpful :slight_smile:

For myself I just tried to experiment according to Jeremy’s suggestions, I did manage to improve the model but I think it’s mostly a “lucky run” given the stochastic nature of the algorithm.

I summarized my experiments in this Excel table (and added some suggestions for further experiments based on the results), if someone interested I will clean the notebook and prepare a more detailed blog post about this task:

![image|690x268](upload://rCsK9Ou5cjIhKMy2yxu83VwqYNO.png)