Lesson 3 - Official Topic

NVM I found my mistake!

Type params in a new cell and you’ll probably see it’s None. Perhaps you destroyed it accidentally. Can you try to go back to the cell that initializes it and the ones below?

2 Likes

Thanks for another great class!

2 Likes

The model in this case is that speed (y) is quadratic in time (t):

y = a*t**2 + b*t + c

Thank you!!

In regards to SGD:

  1. Am I right in understanding that if bs =64, then in one epoch, 64 images are passed in parallel to a GPU to 64 versions of our architecture and it calculates the loss for that epoch?
  2. Also, I believe that over the entire process SGD equates to calculating loss on the entire image set, but is there a way the 64 images in a batch are chosen to get a better approximate in each epoch? Like instead of chasing them randomly, maybe equally from each sub-class?

I see various people are working on projects here: https://forums.fast.ai/t/share-your-v2-projects-here/65757/76

Would be a good to take a look to get ideas on what projects you can work on :slight_smile:

1 Like

No, there is only one model, not 64 versions of it.
As far as strategies go, just looping randomly work great. It’s good to have batches that sometimes have more of one class compared to the others (and obviously, not always the same one).

Thanks class Jeremy, Rachel, and team - especially for highlighting where we can use CPU’s vs GPU’s in inference. That pragmatism addresses quite a few blockers in taking my ideas to production :smile:

Wow that was pretty intense thanks all. Having a case of brain melting because of the math but that is really exciting and cool!!! Thanks to all special to Jeremy Rachel and team yay :+1:

So in each epoch, are 64 images are taken as an input at one instance to the one model which provides an output based on the number of classes? I’m kind of confused as to how this happens. Is the output ‘y’ after each epoch a vector of length 64 each corresponding to each image?

You certainly don’t want to compose a batch from images of the same subclass. The weight update is determined from the average updates over the batch. What makes batch gradient descent work is that on average, a batch is composed of images randomly selected from the overall training set, so the distribution of the batch mirrors that of the overall training set. If the batch were compose of images of the same type, that would no longer be true, and each batch would send the weights in wildly different directions.

Or are 64 images passed one after the other and the loss is calculated overall after the process is complete? this is more intuitive to understand but is this the case?

64 loses that are mean’d to get your reported loss

every loss function has a “reduction” input, you can specify mean or sum or no reduction.

I got that, that’s why I was asking if choosing equally over all classes would be a better option than randomly. But I guess randomly in the end would approximate to the same if the classes are balanced

1 Like

Yes, that is right. In batch gradient descent (sometimes called minibatch gradient descent) the weight update is computed from the average of the weight updates over the 64 images in the batch, so you have to wait until the whole batch is processed.

My main issue in understanding was whether the 64 images are processed simultaneously or one after the other cause to my understanding at one instance of time only 1 image can be fed to a CNN model…

Entire calculation is done in parallel on the GPU. GPUs are specifically good at highly parallel computation.

1 Like

On the other hand, for Stochastic Gradient Descent (SGD), you compute the weight update serially for each individual image; so SGD cannot be parallelized and is therefore slower than batch gradient descent. Also SGD weight updates are noisy compared to batch weight updates, since each is computed from a single example. Batch updates, being averaged over (say) bs=64 examples, are sqrt(64) = 8 times less noisy.

1 Like

I think Jeremy’s ideas are good. In fact as this decease spreads by touch it should be compulsory to wear when shopping for food, for the shelf life of most products is less that 72 hrs.

Maybe I am bit dramatic