I am going to add to the other answers (e.g. @ram_cse), catering perhaps to the more mathematically inclined, but I hope this will be helpful to others.
I am splitting my answer into two parts. This first part is on notation.
Let’s forget for a moment neural nets. The conventional notation in math textbooks it to write a fuction as
y=f(x)
where x
is the variable with respect to which the typical math exercise asks to run the optimization. See for example the cell in the fastbook
notebook 04_mnist_basics.ipynb
with:
def f(x): return x**2
plot_function(f, 'x', 'x**2')
plt.scatter(-1.5, f(-1.5), color='red');
(the function plot_function
is defined in fastbook/utils.py
).
Note that diagrams in subsequent cells show a graph with parameter
along the horizontal axis, rather than x
.
In training a neural net, on the other hand, the (loss) function to optimize involves two very different types of variables:
- weights (and biases): let’s collectively denote them with
w
; - data samples: let’s collectively denote them with
x
.
Thus, writing explicitly the dependence of the loss function on its weights and the data gives
y=f(w,x)
But in the training phase, the optimization is with respect to w
(the weights), not x
(the data). That is, x
remains fixed. (Alternatively, we could decide to incorporate the dependence on x
in the symbol f
, so that the function that we are interested in optimizing would simply be written y=f(w)
, but it is helpful to keep track of the data that we are using, so we shall keep the explicit dependence on x
in the notation as well.)
Revisiting the earlier example in fastbook
notebook 04_mnsit_basics.ipynb
, we now obtain:
def f(w): return w**2
plot_function(f, 'w', 'y')
plt.scatter(-1.5, f(-1.5), color='red');