Can someone explain how the activation function is related to the gradient descent? I guess I don’t have a complete understanding of the activation function, and when I look it up, it messes with my understanding of gradient descent. THIS IS MY UNDERSTANDING OF SGD: A model’s goal is to find the parameters that provide the lowest cost; to do this it will start with a random set of weights, then update them based on the slope of the loss function (the gradient) at that point on the horizontal axis (seeing which direction will move it closer to the minimum), and the learning rate (how much to move in that direction). Eventually, the model will converge on the set of weights that give it around the lowest cost. When I look online, it seems that people are saying the slope of the activation function is used instead of that of the loss function, and I just don’t understand how that is true.
I guess my question can be simplified to: are gradient descent and activation functions independent of each other; if they are what are activation functions used for and is my understanding of GD correct. If they aren’t independent, how should my explanation of GD change to incorporate activation functions?
In lesson 2, Jeremy says that when you train your model and your training loss is higher then your validation loss that you have to train longer or increase the learning rate (minute 49:10). However, I have trained a resnet34 on MNIST_TINY and find that with a learning rate of 3e-02 I get 100% accuracy after around 15 epochs. However at this point the loss on my training set is still higher then on my validation set.
The activation function is a function which determine how nodes in a layer transform the inputs to that layer to outputs of that layer. The output of that layer is then generally used again as input for the next layer, which again has an activation function to transform it into an output.
For example, in the following image you see the inputs of a single neuron (single element of a layer) which are depicted as x1, x2, … xm (and a constant: 1), they get multiplied by their weights (w0, … wm) and summed up (depicted as the summation sign). This scalar is then put through the activation function which can for example be a linear function, logistic function, relu (not displayed) or hyperbolic tangent. The output of that function (again just a single number) is then the input (together with the outputs of all the other neurons in that layer) for the next layer
Now let’s say we have a certain set of weights for our model and we want to improve the weights of our model with gradient descent. Then we are going to do a forward pass on some training data (amount corresponding to the batch size) and see what the model predicts for each of the training items. From these predictions (and their true values) we compute the (training) loss. To update the weights we need to know the derivatives of the loss function with respect to each weight. Because the weights are related to the loss function through (a series of) activation functions this derivate will also involve derivatives of the activation function by the chain rule.
I am still having trouble understanding the distinction between a asynchronous and a synchronous web framework such as Flask and Starlette. Can someone explain in layman terms why asynchronous frameworks work well for model inference? Thanks in advance!
Hi there, I have a little puzzle.
When I run ‘too few epochs’ part: learn = create_cnn(data, models.resnet34, metrics=error_rate, pretrained=False) learn.fit_one_cycle(1).
It is supposed to be train_loss > valid_loss, but somehow I got this:
Did anyone try to train another model with cleaned dataset?
I am struggling with creating ImageDataBunch with the constructor ImageDataBunch.from_csv(). The file cleaned.csv generated by DatasetFormatter and saved in the ‘data/bears’ directory doesn’t work with the from_csv() method. How one should use the from_csv() to get the cleaned dataset created?
Moreover, after deleting the files with the DatasetFormatter and performingImageDataBunch.from_folder() like at the beginning of the notebook I got the same statistics for number of examples in training and validation dataset.
From Lesson-2 notebook (clearly speaking about deletion of files):
Flag photos for deletion by clicking ‘Delete’. Then click ‘Next Batch’ to delete flagged photos and keep the rest in that row. ImageCleaner will show you a new row of images until there are no more to show. In this case, the widget will show you images until there are none left from top_losses.ImageCleaner(ds, idxs)
Really simply - the code will run in order that it is written, however, if some part of the code is taking time, in this case the request.form(), then the code will continue to run before the request has been 100% completed.
This generally results in errors because the following code is dependent on the response of request.form.
asynchronous or async is a way of telling the computer to wait. In this case:
data = await request.form() … the await call is halting the code from running until the request.form() is fully complete.
That is all exactly as it is written in the notebook.
My guess would be that its the data variable that is wrong. It may have gotten messed up somewhere along the way. I would try re-running the notebook after a reset making sure you do the image download folder and file bit correctly. See the “Create directory and upload urls file into your server.” section.
To be sure your data set is correct look in your path (should be ‘data/bears’) and there should be 3 folders ‘black’, ‘teddys’ and ‘grizzly’.
If you just run the code from top to bottom you will probably only have a ‘grizzly’ folder.
Running learner.fit_one_cycle(2, slice(lr)) is different to running learner.fit_one_cycle(1, slice(lr)) twice
Fit_one_cycle refers to the way the model will handle the mini batches and the first argument of the function relates to how many epochs. Does that make sense.
Thanks, actually I wrote a loop to load the data, and I made a double check about it. That’s probably not the case. To figure out, I tried different models with or without pretrained, and I got this.