Lesson 4 - Official Topic

Why do you stack the 3s and 7s dataset again? On top of each other.

That is not entirely true, as we have rewritten the Pytorch DataLoader in fastai,

1 Like

You want to train your model on 3s and 7s together, or it won’t learn to differentiate between the two of them.

Not yet no. There was in fastai v1, and you can probably import it while waiting for it to be ported to v2.

I don’t believe so. It doesn’t take too much trial and error to figure it out. GPU usage is depending on both your dataset items and the model you choose.

I think this was the one used in fastai v1:

Broadcasting was used when adding the bias vector to the weight matrix.

Can you think of other parts of the training process where broadcasting is used?

In the simplest model of SGD (the function called “train_epoch”, the for loop is based on an iterator “dl”, but that is not passed into the function. How does the function get that variable?

It’s defined in the notebook.

Maths are a bit rusty… Why the name linear? The bias isn’t making the matrix multiplication non-linear?

Think of it as y = mx + b

1 Like

Technically, the bias makes it affine, but people still often say linear.

6 Likes

y = mx + b. Is still just a linear function. m for the slope. b for just shifting the line up and down.

y = mx^2 + b would be nonlinear because of the ^2

Edit: I never realized that linear was incorrect and should be affine as sylvain notes. I feel betrayed by conventional vocabulary.

4 Likes

What’s an affine?

Just a linear with intercept?

Yes, but don’t get too distracted by the names :wink: They are not super important.

6 Likes

By using the non-linearity, won’t using a function that makes all negative outputs to zero make many of the gradients in the network zero and stop the learning process due to many zero gradients?

5 Likes

Linear means to me that the parts of an equation or calculation are added or subtracted. The parts are products of values. (https://en.wikipedia.org/wiki/Linear_algebra)

1 Like

How do we know how big the Weight and Bias matrix need to be to approximate a specific problem ?

How do you view the learning after each epoch? To visualize the errors/

You really don’t know in advance.
In most cases it is just a matter of trial and error, figuring out what works best.