Lesson 4 - Official Topic

rfhink · April 15, 2020, 2:35am

Linear means to me that the parts of an equation or calculation are added or subtracted. The parts are products of values. (https://en.wikipedia.org/wiki/Linear_algebra)

Yolo · April 15, 2020, 2:36am

How do we know how big the Weight and Bias matrix need to be to approximate a specific problem ?

amb · April 15, 2020, 2:36am

How do you view the learning after each epoch? To visualize the errors/

FraPochetti · April 15, 2020, 2:38am

You really don’t know in advance.
In most cases it is just a matter of trial and error, figuring out what works best.

Yolo · April 15, 2020, 2:38am

SGD gives a way to refine the parameters but how to guess how large a model is needed to solve the problem at hand ?

quantum · April 15, 2020, 2:39am

How could we use what we’re learning here to get an idea of what the network is learning along the way – like Zeiler and Fergus did, more or less?

FraPochetti · April 15, 2020, 2:40am

Start from literature to understand if your problem has already been studied by someone else.
If yes, copy what has already been done, otherwise you just have to experiment yourself.

JPKab · April 15, 2020, 2:41am

Interestingly, didn’t see any difference in accuracy using nn.LeakyReLU() in place of relu.

theptrk · April 15, 2020, 2:41am

Did we just replace Sigmoid with ReLU?

hiromi · April 15, 2020, 2:41am

Where did this name “itemgot” come from? (having hard time remembering…)

JPKab · April 15, 2020, 2:41am

Yes. Swap it out and see what happens.

mario_carrillo · April 15, 2020, 2:43am

can you explain again what does this line of code doing? learn.recorder.values[-1][2]

sgugger · April 15, 2020, 2:43am

It’s a method of L, which returns the i-th value of each element inside.

init_27 · April 15, 2020, 2:43am

In the first week I learned you can copy past multiple cells, TIL you can select multiple cells and run cells together

nchukaobah · April 15, 2020, 2:44am

Is there a rule of thumb for what nonlinearity to choose given there are many?

jwuphysics · April 15, 2020, 2:44am

No, the sigmoid is applied at the end in order to ensure that predictions are between 0 and 1, which is the case for our binary classification task. Sigmoid is used ~~at the end of the network~~ in the loss function (to be more precise) whereas the ReLU is used in between layers.

Yolo · April 15, 2020, 2:44am

That will give you a good starting point but wont that search of the model space be directionless … SGD gives a way to search the parameter space ensuring progress in local optimal direction. But for model size, topology if we can only come up with better guess then finding a improved model is just pure chance isn’t it?

matdmiller · April 15, 2020, 2:44am

It is grabbing the accuracy metric of the last epoch. [-1] is the last epoch, [2] is the third item from the learn.fit table that is generated.

radikubwa · April 15, 2020, 2:45am

I don’t think we know that prior but we can take a guess therefore we have a prior, the data is something we are trying to solve and we also have a generative model. All that is check. We could look into Bayesian optimization. Please look into this notebook. Thanks to @muellerzr

dcooper01 · April 15, 2020, 2:45am

You wait for Quoc. Le to evolve the new state of the art architecture