Dense vs convolutional vs fully connected layers


(Tom Elliot) #1

Hi there,

I’m a little fuzzy on what is meant by the different layer types. I’ve seen a few different words used to describe layers:

  • Dense
  • Convolutional
  • Fully connected
  • Pooling layer
  • Normalisation

There’s some good info on this page but I haven’t been able to parse it fully yet. Some things suggest a dense layer is the same a fully-connected layer, but other things tell me that a dense layer performs a linear operation from the input to the output and a fully connected layer doesn’t, so I’m kinda confused.

Thanks,
Tom


Terminology
(Jeremy Howard) #2

Dense and fully connected are two names for the same thing.

Did you have any questions or want any clarification about any of the other types of layer?


(Tom Elliot) #3

I’d love some clarification on all of the different layer types. Here’s my understanding so far:

Dense/fully connected layer: A linear operation on the layer’s input vector.
Convolutional layer: A layer that consists of a set of “filters”. The filters take a subset of the input data at a time, but are applied across the full input (by sweeping over the input). The operations performed by this layer are still linear/matrix multiplications, but they go through an activation function at the output, which is usually a non-linear operation.
Pooling layer: We utilise the fact that consecutive layers of the network are activated by “higher” or more complex features that are exhibited by a larger area of the networks input data. A pooling layer effectively down samples the output of the prior layer, reducing the number of operations required for all the following layers, but still passing on the valid information from the previous layer.
Normalisation layer: Used at the input for feature scaling, and in batch normalisation at hidden layers.


(Jeremy Howard) #4

Those are pretty good definitions. Here’s my own version:

Dense layer: A linear operation in which every input is connected to every output by a weight (so there are n_inputs * n_outputs weights - which can be a lot!). Generally followed by a non-linear activation function
Convolutional layer: A linear operation using a subset of the weights of a dense layer. Nearby inputs are connected to nearby outputs (specifically - a convolution ). The weights for the convolutions at each location are shared. Due to the weight sharing, and the use of a subset of the weights of a dense layer, there’s far less weights than in a dense layer. Generally followed by a non-linear activation function
Pooling layer: Replace each patch in the input with a single output, which is the maximum (can also be average) of the input patch
Normalisation layer: Scale the input so that the output has near to a zero mean and unit standard deviation, to allow for faster and more resilient training


(rlisam) #5

Regarding the convolutional layer - there is frequently the usage of the term “filters”. Is the goal of the neural network to compute the correct value for the filter, and thus the term “filter” can be replaced with the term “weights”?


(Jeremy Howard) #6

Yes, although ‘filter’ refers to a set of weights for a single convolution operation. For example, in the convolution intro notebook I showed 8 filters (an edge detector for each vertical, horizontal, and diagonal direction).


(rlisam) #7

So if I call each filter a neuron in the network, would one neuron be initialized with weights to make it a vertical edge detector, and another neuron initialized as a horizontal detector, etc.? And then these weights are adjusted during training?


(Jeremy Howard) #8

The is one activation per filter, for each location in the input grid. So if it’s height 224 x width 224 x # filters 64 = 22422464 activations in the next layer.


(rlisam) #9

So the weights of each filter are fixed? versus weights in a linear regression, where the objective is to adjust the weights until the right function approximation is achieved. Or is there something I’m missing about the activation function? Is there something that’s being adjusted in the activation function?


(Jeremy Howard) #10

No, all of the weights in each filter are optimized using SGD.


(rlisam) #11

In the spreadsheet created for lesson 4, on convolutions, why does the second layer have 2 filter matrices for each input matrix?


(Jeremy Howard) #12

For the convs from column AH, there are two matrices here since we’re creating 2 filters (there’s no particular reason we chose 2 - the first parameter to keras’ Convolution2D layer is the number of filters you want, so this example assumes we had asked for 2 of them). For the convs from column BM, each of the 2 filters we’ve created (and we could have chosen a different # of filters here too) has a 3x3x2 input, since the previous layer has 2 filters. So each filter is shown as 2 matrices, although it’s better to think of it as a single 3x3x2 3-dimensional array.


(Jeremy Howard) #13

Here’s a wonderful in-depth look at convolutions, as they apply to deep learning: http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html