Lesson 4 In-Class Discussion

rajath · November 21, 2017, 4:03am

The difference I feel between one hot encoding and embedding is similar to difference between jacquard simillarity and cosine.

narvind2003 · November 21, 2017, 4:03am

Say for the categorical var “day of week”: all 7 values will have same number of dims…maybe sat and sun will have similar floats in some of those dims after the embeddings get learnt…since they are both weekend days…

claytonjy · November 21, 2017, 4:04am

Is there a clear reason why no lagging of (continuous) features was needed? Does something about the model do that already, or were lags borrowed from the Kaggle winner’s code?

A_TF57 · November 21, 2017, 4:04am

If you think of the embedded vectors as being in a 3D space, vectors that are closer to one another would mean that they are semantically similar. For example, cat maybe similar to dog but far away from a group of Sunday and Saturday etc.

memetzgz · November 21, 2017, 4:04am

@gerardo, it’s df.describe(), actually

Martins · November 21, 2017, 4:05am

print (df.describe(include=[np.object]))

This is what I use for categorical variable description

rajath · November 21, 2017, 4:05am

Yes, but my question is people have already implemented it. Why @Jeremy said it hasn’t been implemented yet if they are fundamentally same.

pete.condon · November 21, 2017, 4:07am

A time series problem is just structured data when a time value makes up part of the unique identifier for each row (it helps if the time values are evenly spaced, e.g. yearly, hourly, etc).

dydt · November 21, 2017, 4:07am

Why isn’t dropout applicable to the continuous variables? Do they get fed directly into a linear layer?

ezequiel · November 21, 2017, 4:07am

Most of the time I only use the label of the column, if it says gender for example it’s clearly categorical, if you don’t have that info because columns names don’t have a meaning I would rule out as categorical any column that has floating numbers, or things like -2, -1, -4 etc…
A priori you could treat any integer column as categorical, even if there are a lot of levels, but most of the time you have the meaning of the column, so I don’t understand how the problem could arise.

wgpubs · November 21, 2017, 4:08am

Would the amount of data inform you as to whether to use RandomForest or a NN?

It seems like RF would be a better option if you don’t have a big dataset, whereas the NN approach works great for things like Rossman where there are some 900k rows.

pete.condon · November 21, 2017, 4:08am

Because it’s a massive assumption that changing the data doesn’t fundamentally change the result, many of the results in structured problems are highly non linear.

claytonjy · November 21, 2017, 4:10am

This matches my experience; I’ve been cursed with small datasets, where RF’s and GBM’s (and smart ensembles of both) tend to be tough to beat.

zaoyang · November 21, 2017, 4:10am

The add_datepart function is domain specific feature engineering right? It just expands one feature into many features.

aloisius · November 21, 2017, 4:11am

There are ensembles of NNs. I wonder if it would be useful to create effectively a NN random forest.

Especially given the order of items learned affects performance and augmentations are random… seems like you could effectively apply the same idea.

pete.condon · November 21, 2017, 4:12am

Resnet’s architecture effectively mimics gradient boosting, because it passes the results from one block onto the next.

PranY · November 21, 2017, 4:12am

I think he meant he hasn’t seen the usage of augmentation/ bootstrapping in the exact similar fashion as embeddings. But still, I think he will be better able to answer your question.

KevinB · November 21, 2017, 4:13am

Is it possible the reason this works is because it has overfit to a single paper? I guess how do you know that hasn’t happened?

anamariapopescug · November 21, 2017, 4:15am

it also works because abstracts are highly formulaic :).

anandsaha · November 21, 2017, 4:16am

Language modeling: The next best paper on neural networks will be written by a neural network