Lesson 4 - Official Topic

erlapi · April 15, 2020, 2:13am

Could you please clarify the difference between dataloader and dataloaderS ?

aho · April 15, 2020, 2:14am

Is there a fastai function to figure this out based on the dataset to maximize batch size based on GPU mem?

sgugger · April 15, 2020, 2:14am

As the s imply, dataloaders is an object with several dataloader: one for training and one for validation.

ilovescience · April 15, 2020, 2:14am

Simply put, DataLoaders is a wrapper for multiple DataLoader's (ex: train and validation dataloader)

amb · April 15, 2020, 2:14am

Why do you stack the 3s and 7s dataset again? On top of each other.

sgugger · April 15, 2020, 2:14am

That is not entirely true, as we have rewritten the Pytorch DataLoader in fastai,

sgugger · April 15, 2020, 2:15am

You want to train your model on 3s and 7s together, or it won’t learn to differentiate between the two of them.

sgugger · April 15, 2020, 2:15am

Not yet no. There was in fastai v1, and you can probably import it while waiting for it to be ported to v2.

matdmiller · April 15, 2020, 2:16am

I don’t believe so. It doesn’t take too much trial and error to figure it out. GPU usage is depending on both your dataset items and the model you choose.

ilovescience · April 15, 2020, 2:16am

I think this was the one used in fastai v1:

ren · April 15, 2020, 2:18am

Broadcasting was used when adding the bias vector to the weight matrix.

Can you think of other parts of the training process where broadcasting is used?

jona · April 15, 2020, 2:20am

In the simplest model of SGD (the function called “train_epoch”, the for loop is based on an iterator “dl”, but that is not passed into the function. How does the function get that variable?

sgugger · April 15, 2020, 2:20am

It’s defined in the notebook.

yfrancois · April 15, 2020, 2:26am

Maths are a bit rusty… Why the name linear? The bias isn’t making the matrix multiplication non-linear?

marii · April 15, 2020, 2:27am

Think of it as y = mx + b

sgugger · April 15, 2020, 2:27am

Technically, the bias makes it affine, but people still often say linear.

erikg · April 15, 2020, 2:27am

y = mx + b. Is still just a linear function. m for the slope. b for just shifting the line up and down.

y = mx^2 + b would be nonlinear because of the ^2

Edit: I never realized that linear was incorrect and should be affine as sylvain notes. I feel betrayed by conventional vocabulary.

maya · April 15, 2020, 2:29am

What’s an affine?

Just a linear with intercept?

sgugger · April 15, 2020, 2:29am

Yes, but don’t get too distracted by the names They are not super important.

harish3110 · April 15, 2020, 2:34am

By using the non-linearity, won’t using a function that makes all negative outputs to zero make many of the gradients in the network zero and stop the learning process due to many zero gradients?