Lesson 4 In-Class Discussion

See post from @neovaldivia above. I think that’s the right way to think about it.

1 Like

How does using dropout in embedding matrix helps? Can anyone please elaborte?

@boxreb14, helps to keep the embedding matrix from overfitting to this particular training data

3 Likes

What are other techniques that can be used here? Similar to how we dealt with images, where we had some ability to further improve the accuracy of the model (through augmentation, batch norm, etc.)…

Are there other techniques for structured / Timseries data that can take this model further, aside from maybe additional feature engineering?

Is there a way we can get the individual relative importances of the variables by using the neural network approach for regression similar to how we can get the relative feature importances in http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html # feature_importances_ ?

Is there any way to use data augmentation or TTA with structured data?

1 Like

You could try some classic ML techniques like a random forest for feature selection.

Edit: That being said, I thought the idea was that a NN would learn those concepts for you and apply as is.

Why would use 4 for embeddings or say 7 or 8?

Is there an easy way to identify categorical vs. Non-Categorical data?
Easy way means with python.

For data augmentation, could we “jitter” the data (add random error) to the observations?

@yinterian This is a very particular question around zipcodes. They are categorical by construct even though they are numbers. So @jeremy suggested to use embeddings, but at the same time, aren’t these zipcodes a function of (locality,area,city,county,state)? What I’m trying to say is, its already a 1 dimensional embedding pre-designed by whosoever came up with how to design zipcodes.

Case in point, do we still do embeddings on zipcodes?

3 Likes

Are data augmentation and bootstrapping(SMOTE & ROSE) same in this context?

Is there a heuristic on when to re-use other people’s embeddings? It seems like that in image analysis this happens fairly often using vgg16 or resnet54.

@gerardo, data.summarize or something like that gives you basic descriptive statistics for a data frame

Use pre-trained always…if it’s available.

1 Like

How do embeddings have “relationships” with each other? It implies that different dimensionality vector are being compared against each and trained in relation to each other. Is this an accurate way of thinking about it?

1 Like

I think that is one reason why using embedding is good. Those locality similarities will be captured by embeddings.

2 Likes

[Random question] So, we’ve seen how time series data can be used as input for neural net. And structured data looks exactly like rows from relational DB. What is time series database, and what is it good for?

Fundamentally, yes, they are same in context i.e. creating new data (due to insufficiency) either by selection or by replication. SMOTE over samples the rare event by not simply replicating but a NN approach. So it again augments the data but in a new way. I think you understand the idea behind what I’m saying.

Re: the question just asked about data augmentation for structured datasets, wouldn’t data augmentation be useful to solve the class imbalance problem similar to how we use SMOTE in machine learning?