Lesson 5 In-Class Discussion ✅

You’d need more iterations with a larger batch size, very probably, but it’s an area of open research.

2 Likes

How do we do “feature engineering”? We’ve explored latent features, but what if we have some intuitions we want to verify? (e.g. we should make more movies about kids in the 80s fighting against demons)

5 Likes

Just try them and see what happens. Feature engineering is like hyperparameters-tuning (or even worse).

3 Likes

What does that mean in practice though? How do you engineer a feature, so to speak?

2 Likes

entity matrix ??

1 Like

Looking at the upper limit of that, if you can fit all the dataset in memory what will setting the batch size to the size of the whole dataset do for the convergence ?

1 Like

with iterations, does that mean epochs?

I was messing around when i was burning through my aws credits >.< and so i really upped the batch size thinking i can speed up the training… but the time seemed to more else be the same or longer.

What neural network method was used for the Rossmann competition before using entity embeddings?

2 Likes

In this case, since you have only one iteration (one minibatch forward and backward pass) per epoch, they mean the same thing.

1 Like

It is harder to get convergence with a higher batch size in the same number of epochs. There are other techniques, but often, you’ll have to do more epochs to have more iterations.

5 Likes

For example you can create (or “engineer”) a new feature that is a linear combination of other features. Or a part of a timestamp, such as week of year, or day of week, etc.

When we train entity embeddings on a dataset using neural net, and then use those entity embeddings in Random Forest to train on the same dataset, do we need to worry about “leakage”? Do we need to split the dataset: one split to train entity embeddings, and the other split to train random forest?

Thanks. Maybe I missed how to do this in the code example. I didn’t see how you “create a feature” in the code so far.

@rachel So no curse of dimensionality?

So do you get a better accuracy when using a bigger batch size and doing more epochs ? My intuition says yes, as SGD will be less dependent on the specific data in the mini batch, is that right ?

1 Like

Jeremy will give examples as he goes into more detail on Rossman.

1 Like

jeremy, what are your top 3 recommendations to become a wizard like you?

1 Like

I downloaded it myself, using wget and unzip.

In the graphs plotting embeddings on 2d space, Jeremy says there is a path through them (~ a pattern exists). But I could technically redraw the curves starting from the left-most point and that would give a different path/function. I guess what I’m hinting at is - could there be an aspect of confirmation bias in interpreting such graphs?

1 Like

Jeremy is explaining why its okay to have tons of parameters now.

1 Like