Sorry I didn’t realize that. I’ve made it into a wiki post - can you edit it now?
Hi all - I have a question about tree interpreter. My understanding of it, and its name, both suggest that it gives the contribution of each feature to the prediction for one row, for one tree. How do we summarize these across trees for the whole TreeEnsemble, for that same row? I guess if the ensemble prediction is the mean of the tree predictions for row i, then each feature’s contribution for the ensemble will be the mean across trees of the contributions from that feature for row i?
[But if feature f is not in every tree, should we take the mean only across the trees where f is present, or across all trees, counting the contribution as zero for trees where f is not present?]
Apologies if this was covered and I missed it (and thanks for letting me know, if that’s the case)
Here’s the last batch for Lesson 12, I’ll update the wiki post later.
ML1 lesson 12 timeline
00:00:01 Final lesson program !
00:01:01 Review of Rossmann Kaggle competition with ‘lesson3-rossman.ipynb’
Using “df.apply(lambda x:…)” and “create_promo2since(x)”
00:04:30 Durations function “get_elapsed(fld, pre):” using “zip()”
Check the notebook for detailed explanations.
00:16:10 Rolling function (or windowing function) for moving-average
Hint: learn the Pandas API for Time-Series, it’s extremely diverse and powerful
00:21:40 Create Features, assign to ‘cat_vars’ and ‘contin_vars’
‘joined_samp’, ‘do_scale=True’, ‘mapper’,
‘yl = np.log(y)’ for RMSPE (Root Mean Squared Percent Error)
Selecting a most recent Validation set in Time-Series, if possible of the exact same length as Test set.
Then dropping the Validation set with ‘val_idx = ’ for final training of the model.
00:32:30 How to create our Deep Learning algorithm (or model), using ‘ColumnarModelData.from_data_frame()’
Use the cardinality of each variable to decide how large to make its embeddings.
Jeremy’s Golden Rule on difference between modern ML and old ML:
“In old ML, we controlled complexity by reducing the number of parameters.
In modern ML, we control it by regularization. We are not much concerned about Overfitting because we use increasing Dropout or Weight-Decay to avoid it”
00:39:20 Checking our submission vs Kaggle Public Leaderboard (not great), then Private Leaderboard (great!).
Why Kaggle Public LB (LeaderBoard) is NOT a good replacement to your own Validation set.
What is the relation between Kaggle Public LB and Private LB ?
00:44:15 Course review (lessons 1 to 12)
Two ways to train a model: one by building a tree, one with SGD (Stochastic Gradient Descent)
Reminder: Tree-building can be combined with Bagging (Random Forests) or Boosting (GBM)
00:46:15 How to represent Categorical variables with Decision Trees
One-hot encoding a vector and its relation with embedding
00:55:50 Interpreting Decision Trees, Random Forests in particular, with Feature Importance.
Use the same techniques to interpret Neural Networks, shuffling Features.
00:59:00 Why Jeremy usually doesn’t care about ‘Statistical Significant’ in ML, due to Data volume, but more about ‘Practical Significance’.
01:03:10 Jeremy talks about “The most important part in this course: Ethics and Data Science, it matters.”
How does Machine Learning influence people’s behavior, and the responsibility that comes with it ?
As a ML practicioner, you should care about the ethics and think about them BEFORE you are involved in one situation.
BTW, you can end up in jail/prison as a techie doing “his job”.
01:08:15 IBM and the “Death’s Calculator” used in gas chamber by the Nazis.
Facebook data science algorithm and the ethnic cleansing in Myanmar’s Rohingya crisis: the Myth of Neutral Platforms.
Facebook lets advertisers exclude users by race enabled advertisers to reach “Jew Haters”.
Your algorithm/model could be exploited by trolls, harassers, authoritarian governements for surveillance, for propaganda or disinformation.
01:16:45 Runaway feedback loops: when Recommendation Systems go bad.
Social Network algorithms are distorting reality by boosting conspiracy theories.
Runaway feedback loops in Predictive Policing: an algorithm biased by race and impacting Justice.
01:21:45 Bias in Image Software (Computer Vision), an example with Faceapp or Google Photos. The first International Beauty Contest judged by A.I.
01:25:15 Bias in Natural Language Processing (NLP)
Another example with an A.I. built to help US Judicial system.
Taser invests in A.I. and body-cameras to “anticipate criminal activity”.
01:34:30 Questions you should ask yourself when you work on A.I.
You have options !
I just finished the final version of ML1 Complete Video Collections, from Lesson 1 to 12.
Most of the lessons are pretty straightforward but I would personally suggest to all students, beginners as well as advanced, to pay special attention at the 2nd part of Lesson 12:
@Jeremy brings up some serious issues about “A.I. and Ethics”, with real-life examples -and unfortunate consequences for some practitioners involved-.
This goes beyond what most A.I. classes cover.
These videos are gold! Thank you so much @jeremy been taking a lot of MOOCs about machine/deep learning. Even the https://www.deeplearning.ai/ classes, but with these videos I have to say I love this style of teaching. You ask questions to a room full of students and even if one person answers you in a slightly incorrect way you try to get more information out of them, hearing everyones response clearly and you responding to them to the best of your ability is by far one of the better methods to understanding the concepts being presented. More MOOCs should take note of this.
Me too, on Windows.
Here a related forum entry:
And this might be a solution:
But I haven’t tried it out because, as for @Brad_S , the “slow” method works for me in this case
Where can I find the notebooks for lessons 6-12? I only see 1-5 on Github. Thanks for putting this together @jeremy !
Just getting in to the first video… I’m not able to unzip train.zip on google colab, I get this error ( I believe I got the same error trying to do the exact same thing on Paperspace a few weeks ago) :
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of train.zip or
train.zip.zip, and cannot find train.zip.ZIP, period.
I also tried just downloading the train.zip file to my local machine, unzipped it, and then not sure how to use something like scp (what is my username & pw at google colab??) Any help is appreciated.
There isn’t a separate notebook for every lesson. Jeremy instead works through these 5 notebooks and discusses a lot more topics throughout the 12 lessons. Hope that helps.
I also got a similar error the first time I tried downloading the zip file to Crestle. It’s easier if you use the kaggle-api to download the data.
Here’s a good example of using the api for downloading in Colab.
After following those steps, you can download the bulldozers dataset using:
!kaggle competitions download -c bluebook-for-bulldozers
Thanks… I dont know where the kaggle competitions command above put the files but then I looked at the api docs and found how to tell it explicitly to put the goods in the ~/data/bulldozers directory.
And now I know about the Kaggle API too
How does l2 regularization improve accuracy?
My understanding of l2 regularization is that it penalizes weights that are far from 0. We are adding a term to the loss function which increases squared with the weight value. So this will force the minimization of the loss function to look for weights that are closer to 0.
Why does that improve accuracy?
I have one question, from the MOOC ML course, and really appreciate your help. In Jeremy’s notebook - he uses split_val function to create the validation set (which is the last 12k rows from the data). That being set however, when we are doing set_rf_examples(5000) , it takes random samples from the entire training set for building individual trees for the RF. Isn’t is possible (though the chances are less likely), that some of the samples that went into the validation set may also get picked while building some of the trees in the RF.
Isn’t the general rule of thumb that the train and validation sample kept separate ?. here is the sample code I am referring to -
def split_vals(a,n): return a[:n], a[n:]
n_valid = 12000
n_trn = len(df_trn)-n_valid
X_train, X_valid = split_vals(df_trn, n_trn)
y_train, y_valid = split_vals(y_trn, n_trn)
raw_train, raw_valid = split_vals(df_raw, n_trn)
Really appreciate your help. If this is answered in other section or other following sessions, I apologize and can read / listen to it, Thanks for your help.
We only pass the training set to the RF, not the validation set.
Thanks Jeremy, I think I misunderstood, so does it mean the set_rf_sampels following these lines of code (which does the split_vals to create train and valid ) only does sampling from the X_train data set , is that correct , if thats the case it makes sense to me.
df_trn, y_trn, nas = proc_df(df_raw, ‘SalePrice’)
X_train, X_valid = split_vals(df_trn, n_trn)
y_train, y_valid = split_vals(y_trn, n_trn)
Thanks for taking the time to reply, appreciate it.
It samples from whatever dataset you provide to the RF.
this material is awesome, thanks!!!
Hello@jeremy! I have two questions.
I’m almost done with the DL course and was wondering what benefits there are to learning typical machine learning. Are there many types of datasets/problems where classic machine learning will out perform deep learning? If so, what are some examples? Basically, I’m trying to decide if I should take this course after I complete the DL one, or just continue studying DL.
What are your thoughts on reinforcement learning? Do you have much experience with it? Any plans on teaching a course about it?
Thanks for all you do, love the teaching style!
Hi @krehbiel21 I will try to answer first point by an example -
This is a snippet from paper FaceNet - https://arxiv.org/pdf/1503.03832.pdf
Our method uses a deep convolutional network trained to directly optimize the embedding itself, rather than an intermediate bottleneck layer as in previous deep learning approaches
Once this embedding has been produced, then the aforementioned tasks become straight-forward: face verification simply involves thresholding the distance between the two embeddings; recognition becomes a k-NN classification problem; and clustering can be achieved using off-theshelf techniques such as k-means or agglomerative clustering.
Here DL and ML is used together serving different purposes to produce a working solution