Brief and clear NN intro for non-ML audience (for meteorology papers)


(Stephan Rasp) #1

Hi everyone,

I am currently writing two papers which use neural networks for improving weather forecasts and climate models. Both papers will be submitted to atmospheric science journals. People there have some mathematical background but not necessarily any background in machine learning. I am struggling to write the description of the neural network.

Essentially, I would love to communicate the general idea of what a neural network is and how it works in a paragraph or two.

Have any of you come across a good example of this or have any good ideas how to approach this? I will soon post my attempt.

Thanks!


#2

Sounds like you have something very neat int he works!

Two paragraphs is not much. One approach might be treating NNs and explaining that they are a universal function approximators and that if we can make them big enough they will work for any function. While they are so flexible it doesn’t mean that they don’t have drawbacks (architecture is important, need substantial amount of data, etc).

Not sure if this is helpful so take it with a grain of salt please. If I were in your shoes I might look in old Hinton’s papers / articles on backprop for inspiration. For instance, maybe this could be helpful.


(Stephan Rasp) #3

Hey thanks for the answer. I hope to post more about my projects in the forums soon.

Talking about NNs as universal function approximators is a good idea. I haven’t implemented it yet into the text below, but it’s in the back of my mind.

One issue is that I essentially need to explain how a neural network works (at least the forward pass) in one paragraph. The other issue is that I kind of want to convey a modern view (i.e. Jeremy’s view) on NNs rather than the old fashioned view that you necessarily need a lot of data and that they are not interpretable. In fact in one of my projects we have very little data and I am using permutation feature importance to gain some insight into my data.

In any case here is my shot at it. This will eventually end up in a publication in Monthly Weather Review… About the most boring sounding journal in the world, but it somehow morphed into the journal to publish the latest research on numerical weather prediction and statistical methods. Please feel free to critique!

This section will give a very brief introduction to neural networks. For people unfamiliar with neural network we strongly recommend Nielsen (2015). For a more advanced treatment of the subject, Goodfellow et al. (2016) is a comprehensive resource.

Neural networks are composed of several layers of interconnected nodes. The first and last layers represent the inputs and outputs, respectively. Additional layers in-between are called hidden layers. The activations, i.e. the values that each node holds, are a weighted sum of all nodes j from the previous layer plus a bias term:
\sum_j w_j x_j + b
Additionally, each hidden layer activation is modified by a non-linear function g(z). For all the neural networks in this study, we use a Rectified Linear Unit, ReLU:
g(z) = \mathrm{max}(0, z)
For our final layer we are not using an activation function. The weights and biases of the network are trained using the backpropagation algorithm in combination with stochastic gradient descent (SGD). Specifically, we are using a version of SGD called Adam (Kingma and Ba 2014).