Lesson 14 in-class

rachel · April 11, 2017, 12:48am

Looking forward to your questions!

rachel · April 11, 2017, 1:26am

Remote participants, are you able to see and hear the live stream?

samwit · April 11, 2017, 1:28am

yes its good for me

benediktschifferer · April 11, 2017, 1:29am

for me, too

renjithmadhavan · April 11, 2017, 1:35am

Its good

benediktschifferer · April 11, 2017, 1:42am

what is his (the guy of SF Data institute) name/email?

rachel · April 11, 2017, 1:44am

@benediktschifferer David Uminsky duminsky@usfca.edu
Mindi Mysliwiec mmysliwiec@usfca.edu is also a useful contact

cpotluri · April 11, 2017, 2:03am

Would you liken the use of embeddings (from a neural network) to extraction of implicit features? Or can we think of it more like what a PCA would do (i.e. dimensionality reduction)?

garima.agarwal · April 11, 2017, 2:12am

In this particular example, do you think the granularity of the data matter - as in per day or per week or per month? Is one better vs the other?

thunderingtyphoons · April 11, 2017, 2:12am

What is the test set? Is it from some time after the training data?

cody · April 11, 2017, 2:13am

Do you know if there’s any work that compares (for structured data) supervised embeddings like these ones to embeddings that come from an unsupervised paradigm (e.g. autoencoder)? It seems like you’d get more useful-for-prediction embeddings with the former case, but if you wanted “general purpose” embeddings, you might prefer the latter.

rachel · April 11, 2017, 2:13am

@thunderingtyphoons yes, the test set is from after the training set

anson · April 11, 2017, 2:13am

.ix is deprecated; use .loc

hamelsmu · April 11, 2017, 2:16am

When you use embeddings from a supervised model in another model, do you have to worry about data leakage?

renjithmadhavan · April 11, 2017, 2:20am

How is the googletrend and weather datasets obtained. I dont see it in the Kaggle data.

renjithmadhavan · April 11, 2017, 2:22am

ok I got it from one of the discussion post. https://www.kaggle.com/c/rossmann-store-sales/discussion/17229

rachel · April 11, 2017, 2:23am

@renjithmadhavan I was just searching for that link but you beat me to it!

taposh · April 11, 2017, 2:29am

Is this similar to windowing function?

anson · April 11, 2017, 2:30am

is there a reason to think that the current approach would be problematic with sparse data?

hamelsmu · April 11, 2017, 2:31am

For the features that are “time until” an event, how do you deal with that given that not you might not know when the last event is in the data?