Another treat! Early access to Intro To Machine Learning videos

(Eric Perbos-Brinck) #521

I can’t edit the post Another treat! Early access to Intro To Machine Learning videos to update the Collection of Video Timelines withe Lesson 10, it seems there’s a timer count-down on author privileges cc @jeremy

So here’s the timelines for Lesson 10.

ML1 lesson 10 timeline

  • 00:00:01 is now available on PIP !
    And more USF students publications: class-wise Processing in NLP, Class-wise Regex Functions
    . Porto Seguro’s Safe Driver Prediction (Kaggle): 1st place solution with zero feature engineering !
    Dealing with semi-supervised-learning (ie. labeled and unlabeled data)
    Data augmentation to create new data examples by creating slightly different versions of data you already have.
    In this case, he used Data Augmentation by creating new rows with 15% randomly selected data.
    Also used “auto-encoder”: the independant variable is the same as the dependant variable, as in “try to predict your input” !

  • 00:08:30 Back to a simple Logistic Regression with MNIST summary
    ’lesson4-mnist_sgd.ipynb’ notebook

  • 00:11:30 PyTorch tutorial on Autograd

  • 00:15:30 “Stream Processing” and “Generator Python”
    . “l.backward()”
    . “net2 = LogReg().cuda()”

  • 00:32:30 Building a complete Neural Net, from scratch, for Logistic Regression in PyTorch, with “nn.Sequential()”

  • 00:58:00 Fitting the model in ‘lesson4-mnist_sgd.ipynb’ notebook
    The secret in modern ML (as covered in the Deep Learning course): massively over-paramaterized the solution to your problem, then use Regularization.

  • 01:02:10 Starting NLP with IMDB dataset and the sentiment classification task
    NLP = Natural Language Processing

  • 01:03:10 Tokenizing and ‘term-document matrix’ & “Bag-of-Words’ creation
    "trn, trn_y = texts_from_folders(f’{PATH}train’, names)” from Fastai library to build arrays of reviews and labels
    Throwing the order of words with Bag-of-Words !

  • 01:08:50 sklearn “CountVectorizer()”
    “fit_transform(trn)” to find the vocabulary in the training set and build a term-document matrix.
    “transform(val)” to apply the same transformation to the validation set.

  • 01:12:30 What is a ‘sparse matrix’ to store only key info and save memory.
    More details in Rachel’s “Computational Algebra” course on Fastai

  • 01:16:40 Using “Naive Bayes” for “Bag-of-Words” approaches.
    Transforming words into features, and dealing with the bias/risk of “zero probabilities” from the data.
    Some demo/discussion about calculating the probabilities of classes.

  • 01:25:00 Why is it called “Naive Bayes”

  • 01:30:00 The difference between theory and practice for "Naive Bayes"
    Using Logistic regression where the features are the unigrams

  • 01:35:40 Using Bigram & Trigram with Naive Bayes (NB) features

(Kimmo Ojala) #522

Thanks so much for this course!

I implemented some Tree Interpreter statistics as suggested by Jeremy in lesson 6. You can find the statistics at the end of this notebook:

The statistics cover feature importance, most common consecutive splits and most common splits (with split variable history) as an attempt to provide some light on feature interaction.

I implemented the statistics on top of the Decision Tree Ensemble created by Jeremy. Comments welcome :-).

(Aditya) #523

Thanks a lot

(Aditya) #524

Will verify on Titanic and update

(Eric Perbos-Brinck) #525

I still can’t edit my main post for Intro to ML1 Video Timelines cc @jeremy.

So here’s the ML1 Lesson 11 Video Timeline.

  • 00:00:01 Review of optimizing multi-layer functions with SGD
    “d(h(g(f(x)))) / dw = 0,6”

  • 00:09:45 Review of Naive Bayes & Logistic Regression for NLP with lesson5-nlp.ipynb notebook

  • 00:16:30 Cross-Entropy as a popular Loss Function for Classification (vs RMSE for Regression)

  • 00:21:30 Creating more NLP features with Ngrams (bigrams, trigrams)

  • 00:23:01 Going back to Naive Bayes and Logistic Regression,
    then ‘We do something weird but actually not that weird’ with “x_nb = x.multiply®”
    Note: watch the whole 15 mins segment for full understanding.

  • 00:39:45 ‘Baselines and Bigrams: Simple, Good Sentiment and Topic Classification’ paper by Sida Wang and Christopher Manning, Stanford U.

  • 00:43:31 Improving it with PyTorch and GPU, with Fastai Naive Bayes or ‘Fastai NBSVM++’ and “class DotProdNB(nn.Module):”
    Note: this long section includes lots of mathematical demonstration and explanation.

  • 01:17:30 Deep Learning: Structured and Time-Series data with Rossmann Kaggle competition, with the 3rd winning solution ‘Entity Embeddings of Categorical Variables’ by Guo/Berkhahn.

  • 01:21:30 Rossmann Kaggle: data cleaning & feature engineering.
    Using Pandas to join tables with ‘Left join’

I’ll be working on Lesson 12 this week-end, which includes some really interesting ethical thoughts by Jeremy on the use (and mis-use) of Machine Learning in the past (IBM and Nazi’s gas chambers) and today (Facebook and Myanmar’s Rohingya crisis).

(Cedric Chee) #526

Thanks for the really useful collection of video timelines.

According to the following thread, I think the cause to why you can’t edit your post was due to your original post going beyond the time limit on how long it can be edited.

(Jeremy Howard) #527

Sorry I didn’t realize that. I’ve made it into a wiki post - can you edit it now?

(Dan Goldner) #528

Hi all - I have a question about tree interpreter. My understanding of it, and its name, both suggest that it gives the contribution of each feature to the prediction for one row, for one tree. How do we summarize these across trees for the whole TreeEnsemble, for that same row? I guess if the ensemble prediction is the mean of the tree predictions for row i, then each feature’s contribution for the ensemble will be the mean across trees of the contributions from that feature for row i?
[But if feature f is not in every tree, should we take the mean only across the trees where f is present, or across all trees, counting the contribution as zero for trees where f is not present?]
Apologies if this was covered and I missed it (and thanks for letting me know, if that’s the case)

(Eric Perbos-Brinck) #529

Here’s the last batch for Lesson 12, I’ll update the wiki post later.

ML1 lesson 12 timeline

  • 00:00:01 Final lesson program !

  • 00:01:01 Review of Rossmann Kaggle competition with ‘lesson3-rossman.ipynb’
    Using “df.apply(lambda x:…)” and “create_promo2since(x)”

  • 00:04:30 Durations function “get_elapsed(fld, pre):” using “zip()”
    Check the notebook for detailed explanations.

  • 00:16:10 Rolling function (or windowing function) for moving-average
    Hint: learn the Pandas API for Time-Series, it’s extremely diverse and powerful

  • 00:21:40 Create Features, assign to ‘cat_vars’ and ‘contin_vars’
    ‘joined_samp’, ‘do_scale=True’, ‘mapper’,
    ‘yl = np.log(y)’ for RMSPE (Root Mean Squared Percent Error)
    Selecting a most recent Validation set in Time-Series, if possible of the exact same length as Test set.
    Then dropping the Validation set with ‘val_idx = [0]’ for final training of the model.

  • 00:32:30 How to create our Deep Learning algorithm (or model), using ‘ColumnarModelData.from_data_frame()’
    Use the cardinality of each variable to decide how large to make its embeddings.
    Jeremy’s Golden Rule on difference between modern ML and old ML:
    “In old ML, we controlled complexity by reducing the number of parameters.
    In modern ML, we control it by regularization. We are not much concerned about Overfitting because we use increasing Dropout or Weight-Decay to avoid it”

  • 00:39:20 Checking our submission vs Kaggle Public Leaderboard (not great), then Private Leaderboard (great!).
    Why Kaggle Public LB (LeaderBoard) is NOT a good replacement to your own Validation set.
    What is the relation between Kaggle Public LB and Private LB ?

  • 00:44:15 Course review (lessons 1 to 12)
    Two ways to train a model: one by building a tree, one with SGD (Stochastic Gradient Descent)
    Reminder: Tree-building can be combined with Bagging (Random Forests) or Boosting (GBM)

  • 00:46:15 How to represent Categorical variables with Decision Trees
    One-hot encoding a vector and its relation with embedding

  • 00:55:50 Interpreting Decision Trees, Random Forests in particular, with Feature Importance.
    Use the same techniques to interpret Neural Networks, shuffling Features.

  • 00:59:00 Why Jeremy usually doesn’t care about ‘Statistical Significant’ in ML, due to Data volume, but more about ‘Practical Significance’.

  • 01:03:10 Jeremy talks about “The most important part in this course: Ethics and Data Science, it matters.”
    How does Machine Learning influence people’s behavior, and the responsibility that comes with it ?
    As a ML practicioner, you should care about the ethics and think about them BEFORE you are involved in one situation.
    BTW, you can end up in jail/prison as a techie doing “his job”.

  • 01:08:15 IBM and the “Death’s Calculator” used in gas chamber by the Nazis.
    Facebook data science algorithm and the ethnic cleansing in Myanmar’s Rohingya crisis: the Myth of Neutral Platforms.
    Facebook lets advertisers exclude users by race enabled advertisers to reach “Jew Haters”.
    Your algorithm/model could be exploited by trolls, harassers, authoritarian governements for surveillance, for propaganda or disinformation.

  • 01:16:45 Runaway feedback loops: when Recommendation Systems go bad.
    Social Network algorithms are distorting reality by boosting conspiracy theories.
    Runaway feedback loops in Predictive Policing: an algorithm biased by race and impacting Justice.

  • 01:21:45 Bias in Image Software (Computer Vision), an example with Faceapp or Google Photos. The first International Beauty Contest judged by A.I.

  • 01:25:15 Bias in Natural Language Processing (NLP)
    Another example with an A.I. built to help US Judicial system.
    Taser invests in A.I. and body-cameras to “anticipate criminal activity”.

  • 01:34:30 Questions you should ask yourself when you work on A.I.
    You have options !

(Eric Perbos-Brinck) #530

I just finished the final version of ML1 Complete Video Collections, from Lesson 1 to 12.

Most of the lessons are pretty straightforward but I would personally suggest to all students, beginners as well as advanced, to pay special attention at the 2nd part of Lesson 12:
@Jeremy brings up some serious issues about “A.I. and Ethics”, with real-life examples -and unfortunate consequences for some practitioners involved-.
This goes beyond what most A.I. classes cover.

(Mathias) #531

These videos are gold! Thank you so much @jeremy been taking a lot of MOOCs about machine/deep learning. Even the classes, but with these videos I have to say I love this style of teaching. You ask questions to a room full of students and even if one person answers you in a slightly incorrect way you try to get more information out of them, hearing everyones response clearly and you responding to them to the best of your ability is by far one of the better methods to understanding the concepts being presented. More MOOCs should take note of this.

(Jonas G F Pettersson) #532

Me too, on Windows.
Here a related forum entry:

And this might be a solution:

But I haven’t tried it out because, as for @Brad_S , the “slow” method works for me in this case

(Matthew Krehbiel) #533

Where can I find the notebooks for lessons 6-12? I only see 1-5 on Github. Thanks for putting this together @jeremy !

(Afshin Mokhtari) #534

Just getting in to the first video… I’m not able to unzip on google colab, I get this error ( I believe I got the same error trying to do the exact same thing on Paperspace a few weeks ago) :
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of or, and cannot find, period.

I also tried just downloading the file to my local machine, unzipped it, and then not sure how to use something like scp (what is my username & pw at google colab??) Any help is appreciated.

(Ankit Goila) #535

There isn’t a separate notebook for every lesson. Jeremy instead works through these 5 notebooks and discusses a lot more topics throughout the 12 lessons. Hope that helps. :slight_smile:

(Ankit Goila) #536

I also got a similar error the first time I tried downloading the zip file to Crestle. It’s easier if you use the kaggle-api to download the data.

Here’s a good example of using the api for downloading in Colab.

After following those steps, you can download the bulldozers dataset using:

!kaggle competitions download -c bluebook-for-bulldozers

(Afshin Mokhtari) #537

Thanks… I dont know where the kaggle competitions command above put the files but then I looked at the api docs and found how to tell it explicitly to put the goods in the ~/data/bulldozers directory.

And now I know about the Kaggle API too :slight_smile:

Thanks again

(Matthew Rosenthal) #541

How does l2 regularization improve accuracy?

My understanding of l2 regularization is that it penalizes weights that are far from 0. We are adding a term to the loss function which increases squared with the weight value. So this will force the minimization of the loss function to look for weights that are closer to 0.

Why does that improve accuracy?

(abhik) #542

I have one question, from the MOOC ML course, and really appreciate your help. In Jeremy’s notebook - he uses split_val function to create the validation set (which is the last 12k rows from the data). That being set however, when we are doing set_rf_examples(5000) , it takes random samples from the entire training set for building individual trees for the RF. Isn’t is possible (though the chances are less likely), that some of the samples that went into the validation set may also get picked while building some of the trees in the RF.

Isn’t the general rule of thumb that the train and validation sample kept separate ?. here is the sample code I am referring to -

def split_vals(a,n): return a[:n], a[n:]
n_valid = 12000
n_trn = len(df_trn)-n_valid
X_train, X_valid = split_vals(df_trn, n_trn)
y_train, y_valid = split_vals(y_trn, n_trn)
raw_train, raw_valid = split_vals(df_raw, n_trn)


Really appreciate your help. If this is answered in other section or other following sessions, I apologize and can read / listen to it, Thanks for your help.

(Jeremy Howard) #544

We only pass the training set to the RF, not the validation set.