Output dimensions for Dense layers?


(Matthew Kleinsmith) #1

What are the reasons for choosing one output dimension for a Dense layer over another?

It seems we’ve been setting the output dimensions of the Dense layers by taking the number of filters in the last convolutional layer and multiplying by 8. Is this true? If so, why the number 8?

I understand that the output dimension of the last Dense layer is the number of classes for the classification task.

Here’s some code I’ve found involving Dense layers:

512 * 8 == 4096

def FCBlock(self):
    model = self.model
    model.add(Dense(4096, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))
[...]
self.ConvBlock(3, 512)
model.add(Flatten())
self.FCBlock()
[...]

64 * 8 == 512

def get_model():
    model = Sequential([
        Lambda(norm_input, input_shape=(1,28,28)),
        Convolution2D(32,3,3, activation='relu'),
        Convolution2D(32,3,3, activation='relu'),
        MaxPooling2D(),
        Convolution2D(64,3,3, activation='relu'),
        Convolution2D(64,3,3, activation='relu'),
        MaxPooling2D(),
        Flatten(),
        Dense(512, activation='relu'),
        Dense(10, activation='softmax')
        ])
    model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

Thank you.


(Jeremy Howard) #2

We talk about that a lot during the lessons - the short answer is that it’s an art, not a science…


(kab) #3

But is there in general a rule of thumb? I don’t believe it’s always conv_output*8. It feels wrong to say we can choose it “randomly”. When I’m trying to explain it to my colleagues, I don’t have a concrete concept of why Dense layers sometimes have output of 4096 or 512 or 128 or 256, etc. Anybody have any light to shed on this?