GlobalAveragePooling2D Use

Hi everyone,
Why do we use GlobalAveragePooling2D before Dense layer in any model ? Is it that we represent the entire filter by an average value and then we feed the average value of all the filters to the next layer (Dense Layer) to learn the correlations among all the filters (For example: One filter might learn the ear, another one the ear and so on and then the Dense layer can learn all the relations between them). If I don’t use Global average pooling and use directly the FC Layer or Dropout for regularization, the model does not converge. It always gives 50-51% accuracy. Why is that ?

Dataset Used: DogsvsCats

I have an input of (8,8,2048), GlobalAveragePooling2D makes it (1,1,2048) and then we feed it in Flatten layer to make it (1,2048) and then to Dense Layer to learn the representations. What if I put the (8,8,2048) directly to Flatten layer (1,131072) and then to Dense Layer and the accuracy is super low (50%). Is it due to overfitting due to such big feature representations input to Dense to learn ? Dropout of 0.5 or 0.6 is not helping either.

@jeremy If you have time, please look to it. I could not find a good explanation to my specific problem.

Some links I visited:

1 Like

You’re using the keras name - we’re using pytorch in this course, so what you’re referring to is adaptive pooling.

A layer of 131072 activations, feeding into a fully connected layer of 1000 activations, means 131072000 weights - that’s an awful lot! So we average the activations across each part of the image first.

What exactly is the GlobalAveragePool2D doing ? The only reason when feeding a lot of activations and the model is not performing is due to a lot of parameters input to Dense ? If I convolve and reduce the size, will it work along with dropout ?

What does “Average Pool” do, it takes a Kernel Size and gives you the Average val, then move by the Stride value. Similar to Max Pool that we have used a few times (takes the max value in a given kernel size).

So, what does Global Average Pool do, the Kernel size is of dimensions H x W. So, it takes the Global Average across Height and Width and gives you a tensor with dimensions of 1xC for an input of H x W x C

1 Like

@ramesh So basically the main purpose of GlobalAveragePooling is to reduce the size by taking an average of one filter. We can reduce the size with other methods too like convolution. So if don’t take an average and send all the weights to Dense layer and to prevent overfititng if I use a dropout layer before it, it should work too, right ? (But it is not working at all)

What works for your problem depends on lots of factors -

  1. Data (is there enough signal to learn)
  2. Network Architecture (what we are discussing here) or Finetuning
  3. Training (Fit) - Learning Rate / Epochs

If the network is stuck at 50% accuracy, there’s no reason to do any dropout. Dropout is a regularization process to avoid overfitting. But your problem is underfitting.

It’s really hard to comment when we can’t see your code / Jupyter Notebook. If you can put it in git or and share here, one of us can try to replicate or suggest you specific steps.

1 Like