Conv2d ni nf

yrodriguezmd · August 10, 2021, 8:17pm

Lesson 13 Convolutions contains this def:

Screen Shot 2021-08-10 at 4.11.59 PM

and the subsequent use:

Screen Shot 2021-08-10 at 4.10.34 PM

I have looked for docs and examples that would enlighten me as to how the ni and nf are determined, but couldn’t find one.

Docs mention of inputs and outputs and channels, but these terms are so commonly used for a wide variety of items, that it’s hard to figure out what they pertain to with regards to this conv2d doc.

I would appreciate if somebody can give a simple explanation, especially on nf.

Thank you!

Maria

dhruv.metha · August 10, 2021, 8:42pm

Hey @yrodriguezmd

nf - is the number of filters that layer of conv2d uses.

Each filter can be thought of as a neuron looking at the image.

Let’s examine one filter. The size of the filter is determined by the kernel size. Here we see for layer 1 the ks is 5. So the filter looks at the top left part of the image 5x5 (width x height) pixels. And then the filter strides (or moves) 2 units to the right (and downwards as well later) and looks at this 5x5 part of the image. Now this is done for the whole of the image by this single filter and we get a 2d array as the output of this filter.

Now, this is just 1 filter. We have 8 filters in total. So 8 2d arrays as output which feed in as input to the next conv layer.

You can check out Andrej Karpathy’s cs231n on YouTube. I think in the second or third lecture he explains this really well.

yrodriguezmd · August 10, 2021, 9:44pm

@dhruv.metha Thank you for the reply. But my question remains: How is the nf determined?

Thank you for sharing the resource - I will check it out!

Maria

dhruv.metha · August 11, 2021, 5:42am

@yrodriguezmd

It’s a hyper parameter that determines how wide you want each layer of the network to be.

An increasing number of filters in each subsequent layer (or set of layers) is likely to see better representations in latter layers of the network. As seen in the Zeiler & Fergus paper the first few layers learn very simple patterns (like lines or a set of parallel lines) and the latter layers learn more abstract features (like eyes, wheels, etc).

These varierty of abstract patterns in the latter layers use combinations of the simple structures from the earlier layers. The number of filters depends on the classification task at hand and experimentation with different numbers on the bear/mnist data would give you more insights.

If you see the resnet or vgg architecture they have increasing number of filters through their layers

yrodriguezmd · August 11, 2021, 10:24pm

@dhruv.metha Thank you for the response! Your third statement helped confirm my growing impression that it is an arbitrary number tailored to the objective of the user.

Maria