Hey thanks for the answer. I hope to post more about my projects in the forums soon.

Talking about NNs as universal function approximators is a good idea. I haven’t implemented it yet into the text below, but it’s in the back of my mind.

One issue is that I essentially need to explain how a neural network works (at least the forward pass) in one paragraph. The other issue is that I kind of want to convey a modern view (i.e. Jeremy’s view) on NNs rather than the old fashioned view that you necessarily need a lot of data and that they are not interpretable. In fact in one of my projects we have very little data and I am using permutation feature importance to gain some insight into my data.

In any case here is my shot at it. This will eventually end up in a publication in Monthly Weather Review… About the most boring sounding journal in the world, but it somehow morphed into the journal to publish the latest research on numerical weather prediction and statistical methods. Please feel free to critique!

This section will give a very brief introduction to neural networks. For people unfamiliar with neural network we strongly recommend Nielsen (2015). For a more advanced treatment of the subject, Goodfellow et al. (2016) is a comprehensive resource.

Neural networks are composed of several layers of interconnected nodes. The first and last layers represent the inputs and outputs, respectively. Additional layers in-between are called hidden layers. The activations, i.e. the values that each node holds, are a weighted sum of all nodes j from the previous layer plus a bias term:

\sum_j w_j x_j + b

Additionally, each hidden layer activation is modified by a non-linear function g(z). For all the neural networks in this study, we use a Rectified Linear Unit, ReLU:

g(z) = \mathrm{max}(0, z)

For our final layer we are not using an activation function. The weights and biases of the network are trained using the backpropagation algorithm in combination with stochastic gradient descent (SGD). Specifically, we are using a version of SGD called Adam (Kingma and Ba 2014).