Lesson 4 In-Class Discussion

Is dropout a good idea for convolutional layers? I get the idea on fully connected layers, we don’t want a single weight to have too much importance, but if we apply that to conv layers, we end up damaging the filters…

Found this thread on reddit.

2 Likes

Validation set.

1 Like

Validation accuracy

1 Like

How do we add xc_fc layers?

We pass in the sizes to the list?

Why do we remove dropout for inference, rather than averaging over the predictions for a bunch of random dropouts? It seems the latter would be better (though more expensive), because the model was trained to predict well with the dropout.

1 Like

What’s the intuition on why Dropout works. It’s basically like sleep: doing neuron pruning? Are there other techniques that are more similar to “sleep” than random pruning kind of like short term and long term reinforced activations similar to short term and long term memory?

Based on the fastai lib:

xtra_fc (list of ints): list of hidden layers with # hidden neurons

Edit:
Locations of the implementation if you wanted to dive in further.
File: conv_learner.py
Class: ConvnetBuilder

6 Likes

Anywhere

Some people explain it as training an ensemble instead of a single network.

Dropouts ensure that all your neurons are getting trained. If you don’t have dropout, it is quite possible that only the dominant ones are getting the training. Hence it’s a training thing only.

2 Likes

@yinterian Are we able to set custom dropout values for each layer (as opposed to a single dropout value)?

1 Like

Its not so much that the model is trained well to predict with dropout, as that we use drop out to help keep the model from memorizing the specific training examples too well.

2 Likes

During an animal’s sleep, connections deemed not valuable are prunned permanently. Dropout is different - it is temporary (for each training batch), and is not used during prediction. It makes sure the model doesn’t rely too much on any one neuron activation, which aids generalization.

1 Like

Why do we monitor the loss to go down instead of the accuracy go up?

1 Like

Is there a reason you use logsoftmax rather than softmax? I thought softmax already incorporated “information” inside and it doesn’t need another log.

Is the dropout can be applied to the linear or fully connected layers? If it is what value will it add?

Because you need to finish with a prob always, linear layer does that for you

Because when the accuracy gets higher it becomes harder to correctly classify more results, but the loss function might be able to be improved.

3 Likes

One intuition is that modifying a network by dropping out random neurons is like having a different network to train at each dropout. At inference time then by using all the neurons it’s like using an ensemble of networks.

3 Likes