Lesson 9 Discussion & Wiki (2019)

I was going through the lesson 9. I wanted to know how the no of iterations i.e. (n-1)//bs + 1 was derived. The expression is correct for all the cases but I am trying to know how the expression came in the first place. It holds true for both even and odd numbers

>  for epoch in range(epochs):
>     for i in range((n-1)//bs + 1):
>         start_i = i*bs

Can someone please explain why we need super().setattr(k,v) in our DummyModule() class? Also which class’s setattr is it calling? Thanks guys.


I’m having trouble understanding the lines

sm_pred = log_softmax(pred)
def nll(input, target): return -input[range(target.shape[0]), target].mean()
loss = nll(sm_pred, y_train)

I’m used to thinking of “likelihood” as being the probability of the data given the parameters, yet the second line uses the true target for the calculation. Why is this? Can someone help clarify what’s going on here?

1 Like

Hi @Rosst, that is a great question!

Loss functions for classification problems need the target labels and the predicted probabilities as inputs, because their computation compares the actual vs. predicted distribution of labels.

Here, the nll (negative log likelihood) loss function takes inputs sm_pred (the predicted labels) and y_train (the target labels).

Hi @cbenett could you please post a snippet showing code you are referring to? In general, .super() refers to the parent class. So the code is referring to the setattr method whichever class DummyModule() inherits from. But if DummyModule() doesn’t explicitly inherit from another class, I’m as confused as you, and I second your question!

It’s using integer array indexing to get the logp for the correct target; I was puzzled at this for some time too when I first encountered it here What is torch.nn really?

1 Like

Hi @cbenett. The super().setattr(k,v) sets all attributes of the DummyModule() object to their corresponding values (as given in init). If you comment that line and then try to create a DummyModule() object it will throw an error!

So here,super() refers to the ‘object’ itself in python. To make it clearer you can create your DummyModule class as this : class DummyModule(object) . This is same as class DummyModule().
We can’t say self.setattr(k,v) since this command will lead to infinite recursion.

Now, you may be wondering that why doesn’t the init method does the job of registering the values to their respective attribute. In this case it doesn’t, because whenever setattr method is explicitly written in a class, it is called instead of the normal mechanism(of setting the value of attributes)

Can someone please explain the purpose of setattr in runner init. runner

It adds callbacks to the runner objects so you can refer them like runner.name_of_the_callback(…)

1 Like

In this lesson we implemented negative log likelihood (nll), but I wonder how backpropagation is calculated using our function, because in our function we just did array look up and mean of those variables, how is gradient calculated on this function and how PyTorch handles it since we just gave it a array look up


how do we find which portion of the code in lesson is in python or in pytorch?

Is anyone facing error as below for cross entropy calculation?

I was able to resolve the issue by converting y_train to long tensor

I have not seen error like this for anyone else on the forum. Any idea what is causing this weirdness?

Have you modified something up the chain, as when i look at y_train in my notebook it comes up as torch.LongTensor. While in your case, it is FloatTensor which is why you need to convert to LongTensor before processing it to mean function.

There is nothing that I modified. Even the results I get when loading the dataset shows different type.

Github link

Hi Santosh,

Your previous post had y_train coming up as Float Tensor, so something has changed between that part of the code and the code shared per your latest post? I cannot see all the code so can’t comment accurately.


Thank you!

Could anybody help me about this index error? I copy the source code but …

I ran it and doesn’t work also. seems like someone requested PR,
and I just plotted without last element.

I have been trying to search on why a uniform distribution would be better or worse than a normal distribution to initialize weights. I have not been able to get a good explanation on the forums. Can someone pls point me to it?

been a while I had the video, but as jeremy explained briefly(course 8), initializing with normal dist doesn’t guarantee mean 0, std1 and even worse when it gets deeper.
xavier/bengio pointed out this, might feel like experimental prove, not mathmatical