Lesson 6 In-Class Discussion ✅

does it make sense to use CNN on such data, by stacking embedding matrix and continous vars somehow as 2d

1 Like

Import fastai instead, it’s a legacy command. We’ll remove it soon.

What’s the best technique to use to specify how many linear layers you need and what size they should be?

2 Likes

We put defaults we found made senses in a lot of cases, but the general answer is ‘try bla’. We don’t have a magic answer, sadly.

Why dropping out input is bad idea when we do something similar in randomforest

2 Likes

I found this article which looks interesting: https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw

1 Like

The Deep (Machine) Learning is still very heuristical and rule-of-thumbish area of knowledge, I guess =)

3 Likes

Can you use bayesian optimization to find the best values for dropout with fastai?

I didn’t realize the dropout idea only came about in 2013/2014! (based on the paper publication date)

1 Like

My intuition for regularisation is - Not depend on one sense too much to take decisions. For instance, depend on eyes/ears/skin etc to understand what something is. Some information might or might not be available at all times

2 Likes

Why bernoully (1-p) be 1 or zero? Didn’t get that part. if p is .5 wouldn’t it be 1-.5 = .5?

1 Like

Here’s a link to a reddit AMA by Geoffry Hinton (from one of the quotes from the slides Jeremy put up). It’s a good read on AI in general. https://www.reddit.com/r/MachineLearning/comments/2lmo0l/ama_geoffrey_hinton/

3 Likes

Yes, it exists in Keras and Tensorflow as well.

A problem arises if you are doing online learning: categories that are present in the test data but not in the training data are “unknown unknowns”. You can’t know them from only looking at the training data.

Interesting. Another way of thinking about it is: if you depend on a handful of activations to distinguish some feature then it is probably not sufficiently important/relevant to generalize.

It will be 1 with prob .5 and 0 with prob .5 (if p=.5)

Why is multiplying at training time any useful ? The point Jeremy made before is that you’d need to multiply by p at test time, why does PyTorch do that both at training time and test time ?

2 Likes

PyTorch multiplies at training time INSTEAD of multiplying at test time.

4 Likes

Image you have a dropout of 0.5, half of your weights disappear, which means your output is probably going to be half as much as before (remember we remove the negatives with ReLU). That’s why.

2 Likes

How does Jeremy jump into full screen in tmux when using VIM (and then back again to his normal panes)?