Lesson 6 In-Class Discussion ✅

iyersathya · November 28, 2018, 3:04am

does it make sense to use CNN on such data, by stacking embedding matrix and continous vars somehow as 2d

sgugger · November 28, 2018, 3:04am

Import fastai instead, it’s a legacy command. We’ll remove it soon.

nextM · November 28, 2018, 3:05am

What’s the best technique to use to specify how many linear layers you need and what size they should be?

sgugger · November 28, 2018, 3:06am

We put defaults we found made senses in a lot of cases, but the general answer is ‘try bla’. We don’t have a magic answer, sadly.

jyoti3 · November 28, 2018, 3:07am

Why dropping out input is bad idea when we do something similar in randomforest

nextM · November 28, 2018, 3:08am

I found this article which looks interesting: https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw

devforfu · November 28, 2018, 3:09am

The Deep (Machine) Learning is still very heuristical and rule-of-thumbish area of knowledge, I guess =)

radikubwa · November 28, 2018, 3:10am

Can you use bayesian optimization to find the best values for dropout with fastai?

ladydata · November 28, 2018, 3:10am

I didn’t realize the dropout idea only came about in 2013/2014! (based on the paper publication date)

ArchieIndian · November 28, 2018, 3:10am

My intuition for regularisation is - Not depend on one sense too much to take decisions. For instance, depend on eyes/ears/skin etc to understand what something is. Some information might or might not be available at all times

gamino · November 28, 2018, 3:12am

Why bernoully (1-p) be 1 or zero? Didn’t get that part. if p is .5 wouldn’t it be 1-.5 = .5?

aidan.davis · November 28, 2018, 3:12am

Here’s a link to a reddit AMA by Geoffry Hinton (from one of the quotes from the slides Jeremy put up). It’s a good read on AI in general. https://www.reddit.com/r/MachineLearning/comments/2lmo0l/ama_geoffrey_hinton/

devforfu · November 28, 2018, 3:12am

Yes, it exists in Keras and Tensorflow as well.

jcatanza · November 28, 2018, 3:12am

A problem arises if you are doing online learning: categories that are present in the test data but not in the training data are “unknown unknowns”. You can’t know them from only looking at the training data.

lesscomfortable · November 28, 2018, 3:12am

Interesting. Another way of thinking about it is: if you depend on a handful of activations to distinguish some feature then it is probably not sufficiently important/relevant to generalize.

rachel · November 28, 2018, 3:12am

It will be 1 with prob .5 and 0 with prob .5 (if p=.5)

PierreO · November 28, 2018, 3:12am

Why is multiplying at training time any useful ? The point Jeremy made before is that you’d need to multiply by p at test time, why does PyTorch do that both at training time and test time ?

rachel · November 28, 2018, 3:13am

PyTorch multiplies at training time INSTEAD of multiplying at test time.

sgugger · November 28, 2018, 3:13am

Image you have a dropout of 0.5, half of your weights disappear, which means your output is probably going to be half as much as before (remember we remove the negatives with ReLU). That’s why.

wgpubs · November 28, 2018, 3:14am

How does Jeremy jump into full screen in tmux when using VIM (and then back again to his normal panes)?