Beginner: SGD and Neural Net foundations ✅

jeremy · April 28, 2022, 9:45pm

We’ll be covering that in detail in a future lesson. For now, the key takeaway is that the basic foundations of a neural net are very simple!

strickvl · April 30, 2022, 12:21pm

I’ve noticed that whenever you train for a single epoch, several things happen underneath the cell.

Training runs for a single epoch to fit / connect the head of the pre-trained model to our new random head for the purposes of whatever we’re doing
Then it trains again for a single epoch (as specified), updating as per our training data.

But for each epoch, it seems like there are two separate stages. One is slow (what I think of in my head as the ‘real training’) where the progress bar completes one cycle of 0-100% and then it again goes through the progress bar from 0-100% quite a bit faster.

What are those two separate processes going on underneath? (Possible we might find out about those at a later stage, in which case feel free to tell me just to wait until a later lesson ) Should I think of them as two separate processes? Is it some kind of consolidation or calculation that’s being represented there?

edwardjross · April 30, 2022, 12:36pm

I think the first slow step is the ‘real training’ of updating the parameters on the training data, and the second fast step is calculating the metrics on the validation set (with the now updated weights).

suvash · April 30, 2022, 12:39pm

Off the top of my head, I believe you’re talking about the different progress bars of training phase followed by a validation phase (on the separately held out validation data).
Jeremy will definitely get into more details about these in the upcoming lectures.

mike.moloch · April 30, 2022, 12:48pm

I wondered about that too. I sort of figured the general idea as you describe but would be good to know about the two stages during the second epoch , maybe there’s a way to make the output more verbose (or maybe not.)

EDIT: I’m not sure if I’m looking in the right place but looks like the progress callbacks has some details on what might be happening behind the scenes:

From the docstring:

_docs = dict(before_fit="Setup the master bar over the epochs",
                 before_epoch="Update the master bar",
                 before_train="Launch a progress bar over the training dataloader",
                 before_validate="Launch a progress bar over the validation dataloader",
                 after_train="Close the progress bar over the training dataloader",
                 after_validate="Close the progress bar over the validation dataloader",
                 after_batch="Update the current progress bar",
                 after_fit="Close the master bar")

I’m not sure if there is a way to make the progress bars “announce” what they’re doing so to speak , ie print out which call back is being called etc.

strickvl · May 3, 2022, 9:05am

Why do we choose error rate instead of accuracy when we’re training our model? Is it a tradition or just something that people do? Accuracy seems more intuitive to me somehow…

n-e-w · May 3, 2022, 9:07am

For non-classification tasks, error rate can be slightly more nuanced than just 1-Accuracy. For a good explanation, see:

and

mike.moloch · May 10, 2022, 1:48pm

One book that I found helpful was Andrew Glassner’s “Deep Learning, A visual Approach”. It is not math heavy at all and a beautifully illustrated book. The title picture is rather drab which is unfortunate, because the book is really geared towards beginners and tries to explain a lot of concepts visually.

Andrew Glassner gave a lecture for non practitioners as well which some might find helpful in grokking the basic concepts of SGD and neural networks without all the intimidating math parts.

Book:

“Deep Learning: A Crash Course” SIGGRAPH video:

Free Chapter on Probability :

jeremy · May 10, 2022, 7:05pm

Wow it looks amazing

suvash · May 11, 2022, 11:59am

Just looked at the PDF and this looks great. I love visual illustrations, when done right it can be very ‘obvious’ and help in explaining complex concepts. I might get a paper copy of this book just for the sake of illustrations. Thanks for sharing !

mike.moloch · May 11, 2022, 12:23pm

Yeah, it’s unfortunate the book’s cover is not attractive at all! I almost skipped over it while browsing through books at the library, but when I looked at the illustrations, I checked it out. I’m thinking of getting a copy too because I had to return it and now I’m in the hold queue again

It is printed on thick paper so the book is quite hefty, but YMMV.

zymoide1 · May 16, 2022, 3:58am

Hi everyone,

Jeremy graciously showed how an SGD version via Excel, which helped clear out a few things. Seeing the whole process of multiplying and summing was beneficial. I still have a few questions:

If we were to visualize the NN he showed, it only has an input of (1424,10), a weight matrix of (10,2), and then an output of (1424,2), correct? To be clear - this is a one-layer NN, with no hidden layer: x_1, … x_m as an input (+bias), multiplied by the weight matrix, and then we’re getting only z_1, z_2 (for a single passenger – Lin1, Lin2). Is this true?
Why did we add up the two ReLUs? assuming we applied a nonlinearity on z_1 and z_2, why do we add these two?
When he refers to GPU and how it’s easier to parallelize these calculations, we can only compute one layer at a time, correct? cause we do need the output on one layer before we continue to the next layer (which we didn’t see in his example if I’m not mistaken).

Thank you all.

stefan-ai · May 16, 2022, 12:48pm

Regarding questions 1 and 2: I was also a bit confused by that part at first. You are right that the input (1424,10) is multiplied by the weight matrix (10,2) which gives the output of shape (1424,2). ReLU is then applied to each of the output columns which gives again an output of shape (1424,2) where all negative values were replaced with zeros. This is so far pretty standard. Now Jeremy adds up the two output columns to get a single prediction per row (which is then compared to the actual target to compute the loss and perform weight updates etc.). What you “normally” would do is to process the two output units of the first layer as inputs to another layer which gives your final prediction, but I guess simply adding up the output columns was meant as a simplification and seems to be working as well

jeremy · May 16, 2022, 10:10pm

This was explained in the previous notebook:

jeremy · May 16, 2022, 10:10pm

That’s right. But we can do lots of rows of data at a time, and lots of groups of coefficients at a time.

zymoide1 · May 17, 2022, 12:27am

Super clear now, I had forgotten about this. Thank you!

Rkap · May 20, 2022, 5:10am

Hi all,
I’m fairly new to neural nets and deep learning in general and I’m really enjoying / keeping up with the course so far. However, I keep coming across words such as CNN,RNN,GANs etc and can’t quite get a good hold on those topics theoretically. I could quite easily follow the ReLU explanation used in neural nets, in lesson 3 and was hoping to find some clarity on those topics in similar terms. Any good resources to look at?
Also, what category of neural net would the very basic excel implementation of the model also done in lesson 3 come under?
Cheers!

jeremy · May 20, 2022, 10:09am

IMO the best resource is the fast.ai course that you’re doing we, which introduce these things once you’ve got the foundations you need to understand them, and when you need to know them!

If you want to skip ahead, the 2020 recordings have you covered, or you could read ahead in the fastai book.

This week’s recommended reading covers RNNs BTW.

Rkap · May 21, 2022, 9:04am

That’s awesome, Jeremy. Looking forward to it! Thanks!

mike.moloch · May 21, 2022, 3:40pm

Your question prompted me to look for some of these terms and I found this glossary by Google. Seems to be pretty good, I’ve bookmarked it. :