Lesson 5 In-Class Discussion ✅

sgugger · November 20, 2018, 3:52am

You’d need more iterations with a larger batch size, very probably, but it’s an area of open research.

whatrocks · November 20, 2018, 3:52am

How do we do “feature engineering”? We’ve explored latent features, but what if we have some intuitions we want to verify? (e.g. we should make more movies about kids in the 80s fighting against demons)

sgugger · November 20, 2018, 3:53am

Just try them and see what happens. Feature engineering is like hyperparameters-tuning (or even worse).

whatrocks · November 20, 2018, 3:54am

What does that mean in practice though? How do you engineer a feature, so to speak?

nikhil.ikhar · November 20, 2018, 3:54am

entity matrix ??

PierreO · November 20, 2018, 3:54am

Looking at the upper limit of that, if you can fit all the dataset in memory what will setting the batch size to the size of the whole dataset do for the convergence ?

Jaghachi · November 20, 2018, 3:55am

with iterations, does that mean epochs?

I was messing around when i was burning through my aws credits >.< and so i really upped the batch size thinking i can speed up the training… but the time seemed to more else be the same or longer.

jcatanza · November 20, 2018, 3:55am

What neural network method was used for the Rossmann competition before using entity embeddings?

lesscomfortable · November 20, 2018, 3:56am

In this case, since you have only one iteration (one minibatch forward and backward pass) per epoch, they mean the same thing.

sgugger · November 20, 2018, 3:56am

It is harder to get convergence with a higher batch size in the same number of epochs. There are other techniques, but often, you’ll have to do more epochs to have more iterations.

jcatanza · November 20, 2018, 3:56am

For example you can create (or “engineer”) a new feature that is a linear combination of other features. Or a part of a timestamp, such as week of year, or day of week, etc.

jnngjnng · November 20, 2018, 3:57am

When we train entity embeddings on a dataset using neural net, and then use those entity embeddings in Random Forest to train on the same dataset, do we need to worry about “leakage”? Do we need to split the dataset: one split to train entity embeddings, and the other split to train random forest?

whatrocks · November 20, 2018, 3:57am

Thanks. Maybe I missed how to do this in the code example. I didn’t see how you “create a feature” in the code so far.

karthikramesh · November 20, 2018, 3:58am

@rachel So no curse of dimensionality?

PierreO · November 20, 2018, 3:58am

So do you get a better accuracy when using a bigger batch size and doing more epochs ? My intuition says yes, as SGD will be less dependent on the specific data in the mini batch, is that right ?

rachel · November 20, 2018, 3:59am

Jeremy will give examples as he goes into more detail on Rossman.

pattyhendrix · November 20, 2018, 3:59am

jeremy, what are your top 3 recommendations to become a wizard like you?

Xiwang · November 20, 2018, 4:00am

I downloaded it myself, using wget and unzip.

soumya_g · November 20, 2018, 4:00am

In the graphs plotting embeddings on 2d space, Jeremy says there is a path through them (~ a pattern exists). But I could technically redraw the curves starting from the left-most point and that would give a different path/function. I guess what I’m hinting at is - could there be an aspect of confirmation bias in interpreting such graphs?

rachel · November 20, 2018, 4:00am

Jeremy is explaining why its okay to have tons of parameters now.