Lesson 4 In-Class Discussion

sidravi1 · November 21, 2017, 3:51am

See post from @neovaldivia above. I think that’s the right way to think about it.

boxreb14 · November 21, 2017, 3:52am

How does using dropout in embedding matrix helps? Can anyone please elaborte?

memetzgz · November 21, 2017, 3:53am

@boxreb14, helps to keep the embedding matrix from overfitting to this particular training data

abdel · November 21, 2017, 3:54am

What are other techniques that can be used here? Similar to how we dealt with images, where we had some ability to further improve the accuracy of the model (through augmentation, batch norm, etc.)…

Are there other techniques for structured / Timseries data that can take this model further, aside from maybe additional feature engineering?

aamarthaluri · November 21, 2017, 3:54am

Is there a way we can get the individual relative importances of the variables by using the neural network approach for regression similar to how we can get the relative feature importances in http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html # feature_importances_ ?

chs820 · November 21, 2017, 3:55am

Is there any way to use data augmentation or TTA with structured data?

charlielee · November 21, 2017, 3:55am

You could try some classic ML techniques like a random forest for feature selection.

Edit: That being said, I thought the idea was that a NN would learn those concepts for you and apply as is.

zaoyang · November 21, 2017, 3:56am

Why would use 4 for embeddings or say 7 or 8?

gerardo · November 21, 2017, 3:57am

Is there an easy way to identify categorical vs. Non-Categorical data?
Easy way means with python.

memetzgz · November 21, 2017, 3:57am

For data augmentation, could we “jitter” the data (add random error) to the observations?

PranY · November 21, 2017, 3:57am

@yinterian This is a very particular question around zipcodes. They are categorical by construct even though they are numbers. So @jeremy suggested to use embeddings, but at the same time, aren’t these zipcodes a function of (locality,area,city,county,state)? What I’m trying to say is, its already a 1 dimensional embedding pre-designed by whosoever came up with how to design zipcodes.

Case in point, do we still do embeddings on zipcodes?

rajath · November 21, 2017, 3:58am

Are data augmentation and bootstrapping(SMOTE & ROSE) same in this context?

zaoyang · November 21, 2017, 3:59am

Is there a heuristic on when to re-use other people’s embeddings? It seems like that in image analysis this happens fairly often using vgg16 or resnet54.

memetzgz · November 21, 2017, 3:59am

@gerardo, data.summarize or something like that gives you basic descriptive statistics for a data frame

narvind2003 · November 21, 2017, 3:59am

Use pre-trained always…if it’s available.

zaoyang · November 21, 2017, 4:00am

How do embeddings have “relationships” with each other? It implies that different dimensionality vector are being compared against each and trained in relation to each other. Is this an accurate way of thinking about it?

ar_ai · November 21, 2017, 4:01am

I think that is one reason why using embedding is good. Those locality similarities will be captured by embeddings.

hiromi · November 21, 2017, 4:01am

[Random question] So, we’ve seen how time series data can be used as input for neural net. And structured data looks exactly like rows from relational DB. What is time series database, and what is it good for?

PranY · November 21, 2017, 4:02am

Fundamentally, yes, they are same in context i.e. creating new data (due to insufficiency) either by selection or by replication. SMOTE over samples the rare event by not simply replicating but a NN approach. So it again augments the data but in a new way. I think you understand the idea behind what I’m saying.

bhavika · November 21, 2017, 4:02am

Re: the question just asked about data augmentation for structured datasets, wouldn’t data augmentation be useful to solve the class imbalance problem similar to how we use SMOTE in machine learning?