Sorry I didn’t realize that. I’ve made it into a wiki post - can you edit it now?

(Dan Goldner) #528

Hi all - I have a question about tree interpreter. My understanding of it, and its name, both suggest that it gives the contribution of each feature to the prediction for one row, for one tree. How do we summarize these across trees for the whole TreeEnsemble, for that same row? I guess if the ensemble prediction is the mean of the tree predictions for row i, then each feature’s contribution for the ensemble will be the mean across trees of the contributions from that feature for row i?
[But if feature f is not in every tree, should we take the mean only across the trees where f is present, or across all trees, counting the contribution as zero for trees where f is not present?]
Apologies if this was covered and I missed it (and thanks for letting me know, if that’s the case)

(Eric Perbos-Brinck) #529

Here’s the last batch for Lesson 12, I’ll update the wiki post later.

ML1 lesson 12 timeline

• 00:00:01 Final lesson program !

• 00:01:01 Review of Rossmann Kaggle competition with ‘lesson3-rossman.ipynb’
Using “df.apply(lambda x:…)” and “create_promo2since(x)”

• 00:04:30 Durations function “get_elapsed(fld, pre):” using “zip()”
Check the notebook for detailed explanations.

• 00:16:10 Rolling function (or windowing function) for moving-average
Hint: learn the Pandas API for Time-Series, it’s extremely diverse and powerful

• 00:21:40 Create Features, assign to ‘cat_vars’ and ‘contin_vars’
‘joined_samp’, ‘do_scale=True’, ‘mapper’,
‘yl = np.log(y)’ for RMSPE (Root Mean Squared Percent Error)
Selecting a most recent Validation set in Time-Series, if possible of the exact same length as Test set.
Then dropping the Validation set with ‘val_idx = [0]’ for final training of the model.

• 00:32:30 How to create our Deep Learning algorithm (or model), using ‘ColumnarModelData.from_data_frame()’
Use the cardinality of each variable to decide how large to make its embeddings.
Jeremy’s Golden Rule on difference between modern ML and old ML:
“In old ML, we controlled complexity by reducing the number of parameters.
In modern ML, we control it by regularization. We are not much concerned about Overfitting because we use increasing Dropout or Weight-Decay to avoid it”

• 00:39:20 Checking our submission vs Kaggle Public Leaderboard (not great), then Private Leaderboard (great!).
Why Kaggle Public LB (LeaderBoard) is NOT a good replacement to your own Validation set.
What is the relation between Kaggle Public LB and Private LB ?

• 00:44:15 Course review (lessons 1 to 12)
Two ways to train a model: one by building a tree, one with SGD (Stochastic Gradient Descent)
Reminder: Tree-building can be combined with Bagging (Random Forests) or Boosting (GBM)

• 00:46:15 How to represent Categorical variables with Decision Trees
One-hot encoding a vector and its relation with embedding

• 00:55:50 Interpreting Decision Trees, Random Forests in particular, with Feature Importance.
Use the same techniques to interpret Neural Networks, shuffling Features.

• 00:59:00 Why Jeremy usually doesn’t care about ‘Statistical Significant’ in ML, due to Data volume, but more about ‘Practical Significance’.

• 01:03:10 Jeremy talks about “The most important part in this course: Ethics and Data Science, it matters.”
How does Machine Learning influence people’s behavior, and the responsibility that comes with it ?
As a ML practicioner, you should care about the ethics and think about them BEFORE you are involved in one situation.
BTW, you can end up in jail/prison as a techie doing “his job”.

• 01:08:15 IBM and the “Death’s Calculator” used in gas chamber by the Nazis.
Facebook data science algorithm and the ethnic cleansing in Myanmar’s Rohingya crisis: the Myth of Neutral Platforms.
Your algorithm/model could be exploited by trolls, harassers, authoritarian governements for surveillance, for propaganda or disinformation.

• 01:16:45 Runaway feedback loops: when Recommendation Systems go bad.
Social Network algorithms are distorting reality by boosting conspiracy theories.
Runaway feedback loops in Predictive Policing: an algorithm biased by race and impacting Justice.

• 01:21:45 Bias in Image Software (Computer Vision), an example with Faceapp or Google Photos. The first International Beauty Contest judged by A.I.

• 01:25:15 Bias in Natural Language Processing (NLP)
Another example with an A.I. built to help US Judicial system.
Taser invests in A.I. and body-cameras to “anticipate criminal activity”.

• 01:34:30 Questions you should ask yourself when you work on A.I.
You have options !

(Eric Perbos-Brinck) #530

I just finished the final version of ML1 Complete Video Collections, from Lesson 1 to 12.

Most of the lessons are pretty straightforward but I would personally suggest to all students, beginners as well as advanced, to pay special attention at the 2nd part of Lesson 12:
@Jeremy brings up some serious issues about “A.I. and Ethics”, with real-life examples -and unfortunate consequences for some practitioners involved-.
This goes beyond what most A.I. classes cover.

(Mathias) #531

These videos are gold! Thank you so much @jeremy been taking a lot of MOOCs about machine/deep learning. Even the https://www.deeplearning.ai/ classes, but with these videos I have to say I love this style of teaching. You ask questions to a room full of students and even if one person answers you in a slightly incorrect way you try to get more information out of them, hearing everyones response clearly and you responding to them to the best of your ability is by far one of the better methods to understanding the concepts being presented. More MOOCs should take note of this.

Me too, on Windows.
Here a related forum entry:

And this might be a solution:

But I haven’t tried it out because, as for @Brad_S , the “slow” method works for me in this case

(Matthew Krehbiel) #533

Where can I find the notebooks for lessons 6-12? I only see 1-5 on Github. Thanks for putting this together @jeremy !

(Afshin Mokhtari) #534

Just getting in to the first video… I’m not able to unzip train.zip on google colab, I get this error ( I believe I got the same error trying to do the exact same thing on Paperspace a few weeks ago) :
Archive: train.zip
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of train.zip or
train.zip.zip, and cannot find train.zip.ZIP, period.

I also tried just downloading the train.zip file to my local machine, unzipped it, and then not sure how to use something like scp (what is my username & pw at google colab??) Any help is appreciated.

(Ankit Goila) #535

There isn’t a separate notebook for every lesson. Jeremy instead works through these 5 notebooks and discusses a lot more topics throughout the 12 lessons. Hope that helps.

(Ankit Goila) #536

I also got a similar error the first time I tried downloading the zip file to Crestle. It’s easier if you use the kaggle-api to download the data.

`!kaggle competitions download -c bluebook-for-bulldozers`

(Afshin Mokhtari) #537

Thanks… I dont know where the kaggle competitions command above put the files but then I looked at the api docs and found how to tell it explicitly to put the goods in the ~/data/bulldozers directory.

And now I know about the Kaggle API too

Thanks again

(Matthew Rosenthal) #541

How does l2 regularization improve accuracy?

My understanding of l2 regularization is that it penalizes weights that are far from 0. We are adding a term to the loss function which increases squared with the weight value. So this will force the minimization of the loss function to look for weights that are closer to 0.

Why does that improve accuracy?

(abhik) #542

I have one question, from the MOOC ML course, and really appreciate your help. In Jeremy’s notebook - he uses split_val function to create the validation set (which is the last 12k rows from the data). That being set however, when we are doing set_rf_examples(5000) , it takes random samples from the entire training set for building individual trees for the RF. Isn’t is possible (though the chances are less likely), that some of the samples that went into the validation set may also get picked while building some of the trees in the RF.

Isn’t the general rule of thumb that the train and validation sample kept separate ?. here is the sample code I am referring to -

def split_vals(a,n): return a[:n], a[n:]
n_valid = 12000
n_trn = len(df_trn)-n_valid
X_train, X_valid = split_vals(df_trn, n_trn)
y_train, y_valid = split_vals(y_trn, n_trn)
raw_train, raw_valid = split_vals(df_raw, n_trn)

set_rf_samples(50000)

Really appreciate your help. If this is answered in other section or other following sessions, I apologize and can read / listen to it, Thanks for your help.

We only pass the training set to the RF, not the validation set.

(abhik) #545

Thanks Jeremy, I think I misunderstood, so does it mean the set_rf_sampels following these lines of code (which does the split_vals to create train and valid ) only does sampling from the X_train data set , is that correct , if thats the case it makes sense to me.

df_trn, y_trn, nas = proc_df(df_raw, ‘SalePrice’)
X_train, X_valid = split_vals(df_trn, n_trn)
y_train, y_valid = split_vals(y_trn, n_trn)

Thanks for taking the time to reply, appreciate it.

It samples from whatever dataset you provide to the RF.

(sleepy) #547

this material is awesome, thanks!!!

(DILIP S) #548

Nice and subtle explanations. Thanks @ramesh.

(Matthew Krehbiel) #549

Hello@jeremy! I have two questions.

1. I’m almost done with the DL course and was wondering what benefits there are to learning typical machine learning. Are there many types of datasets/problems where classic machine learning will out perform deep learning? If so, what are some examples? Basically, I’m trying to decide if I should take this course after I complete the DL one, or just continue studying DL.

2. What are your thoughts on reinforcement learning? Do you have much experience with it? Any plans on teaching a course about it?

Thanks for all you do, love the teaching style!

(Rahul Pathak) #550

Hi @krehbiel21 I will try to answer first point by an example -

This is a snippet from paper FaceNet - https://arxiv.org/pdf/1503.03832.pdf

``````Our method uses a deep convolutional network trained
to directly optimize the embedding itself, rather than an intermediate
bottleneck layer as in previous deep learning
approaches
``````
``````Once this embedding has been produced, then the aforementioned