Lesson 2 In-Class Discussion ✅

Can someone explain how the activation function is related to the gradient descent? I guess I don’t have a complete understanding of the activation function, and when I look it up, it messes with my understanding of gradient descent. THIS IS MY UNDERSTANDING OF SGD: A model’s goal is to find the parameters that provide the lowest cost; to do this it will start with a random set of weights, then update them based on the slope of the loss function (the gradient) at that point on the horizontal axis (seeing which direction will move it closer to the minimum), and the learning rate (how much to move in that direction). Eventually, the model will converge on the set of weights that give it around the lowest cost. When I look online, it seems that people are saying the slope of the activation function is used instead of that of the loss function, and I just don’t understand how that is true.

I guess my question can be simplified to: are gradient descent and activation functions independent of each other; if they are what are activation functions used for and is my understanding of GD correct. If they aren’t independent, how should my explanation of GD change to incorporate activation functions?

I am getting the same error with SageMaker note book instance, !pip install ipywidgets did not help, any help?

Error:


ModuleNotFoundError Traceback (most recent call last)
in
----> 1 from fastai.widgets import *

~/SageMaker/envs/fastai/lib/python3.7/site-packages/fastai/widgets/init.py in
----> 1 from .image_cleaner import *
2 from .image_downloader import *

~/SageMaker/envs/fastai/lib/python3.7/site-packages/fastai/widgets/image_cleaner.py in
7 from …callbacks.hooks import *
8 from …layers import *
----> 9 from ipywidgets import widgets, Layout
10 from IPython.display import clear_output, display
11

ModuleNotFoundError: No module named ‘ipywidgets’

was choosing generic python 3 kernel, but for sagemaker instance we need to choose conda_fastai kernel.

@jeremy is it possible to update the instructions or let me know where to submit a PR or something.

Any idea how do I upload my image urls in Colab Notebook as I do not see upload button or notebook directory structure which I usually see locally…

Got the answer from here Platform: Colab ✅

1 Like

Wait, wait wait, how is this :thinking::thinking::thinking: ???

So learner.fit_one_cycle(2, slice(lr)) is different from running twice learner.fit_one_cycle(2, slice(lr)) ???

In lesson 2, Jeremy says that when you train your model and your training loss is higher then your validation loss that you have to train longer or increase the learning rate (minute 49:10). However, I have trained a resnet34 on MNIST_TINY and find that with a learning rate of 3e-02 I get 100% accuracy after around 15 epochs. However at this point the loss on my training set is still higher then on my validation set.

What’s the meaning of that?

The activation function is a function which determine how nodes in a layer transform the inputs to that layer to outputs of that layer. The output of that layer is then generally used again as input for the next layer, which again has an activation function to transform it into an output.

For example, in the following image you see the inputs of a single neuron (single element of a layer) which are depicted as x1, x2, … xm (and a constant: 1), they get multiplied by their weights (w0, … wm) and summed up (depicted as the summation sign). This scalar is then put through the activation function which can for example be a linear function, logistic function, relu (not displayed) or hyperbolic tangent. The output of that function (again just a single number) is then the input (together with the outputs of all the other neurons in that layer) for the next layer

Now let’s say we have a certain set of weights for our model and we want to improve the weights of our model with gradient descent. Then we are going to do a forward pass on some training data (amount corresponding to the batch size) and see what the model predicts for each of the training items. From these predictions (and their true values) we compute the (training) loss. To update the weights we need to know the derivatives of the loss function with respect to each weight. Because the weights are related to the loss function through (a series of) activation functions this derivate will also involve derivatives of the activation function by the chain rule.

Hope this helps!

1 Like

That makes so much more sense. That also explains where the non-linearity in the model is coming in from. Thank you so much Lucas!

Hi ,

I am getting below error from class -2 script

Any idea?

Hi,

I am still having trouble understanding the distinction between a asynchronous and a synchronous web framework such as Flask and Starlette. Can someone explain in layman terms why asynchronous frameworks work well for model inference? Thanks in advance!

`@app.route(’/analyze’, methods=[‘POST’])
async def analyze(request):

  data = await request.form()
  img_bytes = await (data['file'].read())
  img = open_image(BytesIO(img_bytes))
  prediction = learn.predict(img)[0]
  return JSONResponse({'result': str(prediction)})

`

com_d
How to choose the best learning rate?

Hi there, I have a little puzzle.
When I run ‘too few epochs’ part:
learn = create_cnn(data, models.resnet34, metrics=error_rate, pretrained=False) learn.fit_one_cycle(1).
It is supposed to be train_loss > valid_loss, but somehow I got this:
image
What am I doing wrong?

Did anyone try to train another model with cleaned dataset?
I am struggling with creating ImageDataBunch with the constructor ImageDataBunch.from_csv(). The file cleaned.csv generated by DatasetFormatter and saved in the ‘data/bears’ directory doesn’t work with the from_csv() method. How one should use the from_csv() to get the cleaned dataset created?

Moreover, after deleting the files with the DatasetFormatter and performingImageDataBunch.from_folder() like at the beginning of the notebook I got the same statistics for number of examples in training and validation dataset.

From Lesson-2 notebook (clearly speaking about deletion of files):

Flag photos for deletion by clicking ‘Delete’. Then click ‘Next Batch’ to delete flagged photos and keep the rest in that row. ImageCleaner will show you a new row of images until there are no more to show. In this case, the widget will show you images until there are none left from top_losses.ImageCleaner(ds, idxs)

Thank you for your help!

Hey Jeff

Really simply - the code will run in order that it is written, however, if some part of the code is taking time, in this case the request.form(), then the code will continue to run before the request has been 100% completed.

This generally results in errors because the following code is dependent on the response of request.form.

asynchronous or async is a way of telling the computer to wait. In this case:
data = await request.form() … the await call is halting the code from running until the request.form() is fully complete.

It can definitely be tricky - but keep at it and it becomes pretty simple. Check this out for some more async await info: https://www.youtube.com/watch?v=XO77Fib9tSI

That is all exactly as it is written in the notebook.

My guess would be that its the data variable that is wrong. It may have gotten messed up somewhere along the way. I would try re-running the notebook after a reset making sure you do the image download folder and file bit correctly. See the “Create directory and upload urls file into your server.” section.

To be sure your data set is correct look in your path (should be ‘data/bears’) and there should be 3 folders ‘black’, ‘teddys’ and ‘grizzly’.

If you just run the code from top to bottom you will probably only have a ‘grizzly’ folder.

Hope that helps…

Hey Preka

Running learner.fit_one_cycle(2, slice(lr)) is different to running learner.fit_one_cycle(1, slice(lr)) twice

Fit_one_cycle refers to the way the model will handle the mini batches and the first argument of the function relates to how many epochs. Does that make sense.

Check this for more details. https://medium.com/@nachiket.tanksale/finding-good-learning-rate-and-the-one-cycle-policy-7159fe1db5d6

1 Like

Thanks, actually I wrote a loop to load the data, and I made a double check about it. That’s probably not the case. To figure out, I tried different models with or without pretrained, and I got this.


The pretrained models seem more reasonable, but I still don’t know why :confused:

1 Like

What happens if you run more epochs on it?
Are you using your own data set or are you using the bears?

Confusing !!