Lesson 4 In-Class Discussion

lgvaz · November 21, 2017, 2:49am

Is dropout a good idea for convolutional layers? I get the idea on fully connected layers, we don’t want a single weight to have too much importance, but if we apply that to conv layers, we end up damaging the filters…

charlielee · November 21, 2017, 2:49am

Found this thread on reddit.

binga · November 21, 2017, 2:49am

Validation set.

yinterian · November 21, 2017, 2:50am

Validation accuracy

ecdrid · November 21, 2017, 2:50am

How do we add xc_fc layers?

We pass in the sizes to the list?

wbrucek · November 21, 2017, 2:50am

Why do we remove dropout for inference, rather than averaging over the predictions for a bunch of random dropouts? It seems the latter would be better (though more expensive), because the model was trained to predict well with the dropout.

zaoyang · November 21, 2017, 2:51am

What’s the intuition on why Dropout works. It’s basically like sleep: doing neuron pruning? Are there other techniques that are more similar to “sleep” than random pruning kind of like short term and long term reinforced activations similar to short term and long term memory?

charlielee · November 21, 2017, 2:52am

Based on the fastai lib:

xtra_fc (list of ints): list of hidden layers with # hidden neurons

Edit:
Locations of the implementation if you wanted to dive in further.
File: conv_learner.py
Class: ConvnetBuilder

wbrucek · November 21, 2017, 2:52am

Anywhere

yinterian · November 21, 2017, 2:53am

Some people explain it as training an ensemble instead of a single network.

anandsaha · November 21, 2017, 2:53am

Dropouts ensure that all your neurons are getting trained. If you don’t have dropout, it is quite possible that only the dominant ones are getting the training. Hence it’s a training thing only.

dgovender · November 21, 2017, 2:54am

@yinterian Are we able to set custom dropout values for each layer (as opposed to a single dropout value)?

beacrett · November 21, 2017, 2:54am

Its not so much that the model is trained well to predict with dropout, as that we use drop out to help keep the model from memorizing the specific training examples too well.

wbrucek · November 21, 2017, 2:55am

During an animal’s sleep, connections deemed not valuable are prunned permanently. Dropout is different - it is temporary (for each training batch), and is not used during prediction. It makes sure the model doesn’t rely too much on any one neuron activation, which aids generalization.

nafizh · November 21, 2017, 2:55am

Why do we monitor the loss to go down instead of the accuracy go up?

zaoyang · November 21, 2017, 2:55am

Is there a reason you use logsoftmax rather than softmax? I thought softmax already incorporated “information” inside and it doesn’t need another log.

santhanam · November 21, 2017, 2:55am

Is the dropout can be applied to the linear or fully connected layers? If it is what value will it add?

neovaldivia · November 21, 2017, 2:56am

Because you need to finish with a prob always, linear layer does that for you

pete.condon · November 21, 2017, 2:56am

Because when the accuracy gets higher it becomes harder to correctly classify more results, but the loss function might be able to be improved.

Robi · November 21, 2017, 2:58am

One intuition is that modifying a network by dropping out random neurons is like having a different network to train at each dropout. At inference time then by using all the neurons it’s like using an ensemble of networks.