00:00:01Fast.ai is now available on PIP !
And more USF students publications: class-wise Processing in NLP, Class-wise Regex Functions
. Porto Seguro’s Safe Driver Prediction (Kaggle): 1st place solution with zero feature engineering !
Dealing with semi-supervised-learning (ie. labeled and unlabeled data)
Data augmentation to create new data examples by creating slightly different versions of data you already have.
In this case, he used Data Augmentation by creating new rows with 15% randomly selected data.
Also used “auto-encoder”: the independant variable is the same as the dependant variable, as in “try to predict your input” !
00:08:30 Back to a simple Logistic Regression with MNIST summary
’lesson4-mnist_sgd.ipynb’ notebook
00:32:30 Building a complete Neural Net, from scratch, for Logistic Regression in PyTorch, with “nn.Sequential()”
00:58:00 Fitting the model in ‘lesson4-mnist_sgd.ipynb’ notebook
The secret in modern ML (as covered in the Deep Learning course): massively over-paramaterized the solution to your problem, then use Regularization.
01:02:10 Starting NLP with IMDB dataset and the sentiment classification task
NLP = Natural Language Processing
01:03:10 Tokenizing and ‘term-document matrix’ & “Bag-of-Words’ creation
"trn, trn_y = texts_from_folders(f’{PATH}train’, names)” from Fastai library to build arrays of reviews and labels
Throwing the order of words with Bag-of-Words !
01:08:50 sklearn “CountVectorizer()”
“fit_transform(trn)” to find the vocabulary in the training set and build a term-document matrix.
“transform(val)” to apply the same transformation to the validation set.
01:12:30 What is a ‘sparse matrix’ to store only key info and save memory.
More details in Rachel’s “Computational Algebra” course on Fastai
01:16:40 Using “Naive Bayes” for “Bag-of-Words” approaches.
Transforming words into features, and dealing with the bias/risk of “zero probabilities” from the data.
Some demo/discussion about calculating the probabilities of classes.
The statistics cover feature importance, most common consecutive splits and most common splits (with split variable history) as an attempt to provide some light on feature interaction.
I implemented the statistics on top of the Decision Tree Ensemble created by Jeremy. Comments welcome :-).
00:00:01 Review of optimizing multi-layer functions with SGD
“d(h(g(f(x)))) / dw = 0,6”
00:09:45 Review of Naive Bayes & Logistic Regression for NLP with lesson5-nlp.ipynb notebook
00:16:30 Cross-Entropy as a popular Loss Function for Classification (vs RMSE for Regression)
00:21:30 Creating more NLP features with Ngrams (bigrams, trigrams)
00:23:01 Going back to Naive Bayes and Logistic Regression,
then ‘We do something weird but actually not that weird’ with “x_nb = x.multiply®”
Note: watch the whole 15 mins segment for full understanding.
00:39:45 ‘Baselines and Bigrams: Simple, Good Sentiment and Topic Classification’ paper by Sida Wang and Christopher Manning, Stanford U.
00:43:31 Improving it with PyTorch and GPU, with Fastai Naive Bayes or ‘Fastai NBSVM++’ and “class DotProdNB(nn.Module):”
Note: this long section includes lots of mathematical demonstration and explanation.
01:17:30 Deep Learning: Structured and Time-Series data with Rossmann Kaggle competition, with the 3rd winning solution ‘Entity Embeddings of Categorical Variables’ by Guo/Berkhahn.
01:21:30 Rossmann Kaggle: data cleaning & feature engineering.
Using Pandas to join tables with ‘Left join’
I’ll be working on Lesson 12 this week-end, which includes some really interesting ethical thoughts by Jeremy on the use (and mis-use) of Machine Learning in the past (IBM and Nazi’s gas chambers) and today (Facebook and Myanmar’s Rohingya crisis).
Thanks for the really useful collection of video timelines.
According to the following thread, I think the cause to why you can’t edit your post was due to your original post going beyond the time limit on how long it can be edited.
Hi all - I have a question about tree interpreter. My understanding of it, and its name, both suggest that it gives the contribution of each feature to the prediction for one row, for one tree. How do we summarize these across trees for the whole TreeEnsemble, for that same row? I guess if the ensemble prediction is the mean of the tree predictions for row i, then each feature’s contribution for the ensemble will be the mean across trees of the contributions from that feature for row i?
[But if feature f is not in every tree, should we take the mean only across the trees where f is present, or across all trees, counting the contribution as zero for trees where f is not present?]
Apologies if this was covered and I missed it (and thanks for letting me know, if that’s the case)
00:01:01 Review of Rossmann Kaggle competition with ‘lesson3-rossman.ipynb’
Using “df.apply(lambda x:…)” and “create_promo2since(x)”
00:04:30 Durations function “get_elapsed(fld, pre):” using “zip()”
Check the notebook for detailed explanations.
00:16:10 Rolling function (or windowing function) for moving-average
Hint: learn the Pandas API for Time-Series, it’s extremely diverse and powerful
00:21:40 Create Features, assign to ‘cat_vars’ and ‘contin_vars’
‘joined_samp’, ‘do_scale=True’, ‘mapper’,
‘yl = np.log(y)’ for RMSPE (Root Mean Squared Percent Error)
Selecting a most recent Validation set in Time-Series, if possible of the exact same length as Test set.
Then dropping the Validation set with ‘val_idx = [0]’ for final training of the model.
00:32:30 How to create our Deep Learning algorithm (or model), using ‘ColumnarModelData.from_data_frame()’
Use the cardinality of each variable to decide how large to make its embeddings.
Jeremy’s Golden Rule on difference between modern ML and old ML:
“In old ML, we controlled complexity by reducing the number of parameters.
In modern ML, we control it by regularization. We are not much concerned about Overfitting because we use increasing Dropout or Weight-Decay to avoid it”
00:39:20 Checking our submission vs Kaggle Public Leaderboard (not great), then Private Leaderboard (great!).
Why Kaggle Public LB (LeaderBoard) is NOT a good replacement to your own Validation set.
What is the relation between Kaggle Public LB and Private LB ?
00:44:15 Course review (lessons 1 to 12)
Two ways to train a model: one by building a tree, one with SGD (Stochastic Gradient Descent)
Reminder: Tree-building can be combined with Bagging (Random Forests) or Boosting (GBM)
00:46:15 How to represent Categorical variables with Decision Trees
One-hot encoding a vector and its relation with embedding
00:55:50 Interpreting Decision Trees, Random Forests in particular, with Feature Importance.
Use the same techniques to interpret Neural Networks, shuffling Features.
00:59:00 Why Jeremy usually doesn’t care about ‘Statistical Significant’ in ML, due to Data volume, but more about ‘Practical Significance’.
01:03:10 Jeremy talks about “The most important part in this course: Ethics and Data Science, it matters.”
How does Machine Learning influence people’s behavior, and the responsibility that comes with it ?
As a ML practicioner, you should care about the ethics and think about them BEFORE you are involved in one situation.
BTW, you can end up in jail/prison as a techie doing “his job”.
01:08:15 IBM and the “Death’s Calculator” used in gas chamber by the Nazis.
Facebook data science algorithm and the ethnic cleansing in Myanmar’s Rohingya crisis: the Myth of Neutral Platforms.
Facebook lets advertisers exclude users by race enabled advertisers to reach “Jew Haters”.
Your algorithm/model could be exploited by trolls, harassers, authoritarian governements for surveillance, for propaganda or disinformation.
01:16:45 Runaway feedback loops: when Recommendation Systems go bad.
Social Network algorithms are distorting reality by boosting conspiracy theories.
Runaway feedback loops in Predictive Policing: an algorithm biased by race and impacting Justice.
01:21:45 Bias in Image Software (Computer Vision), an example with Faceapp or Google Photos. The first International Beauty Contest judged by A.I.
01:25:15 Bias in Natural Language Processing (NLP)
Another example with an A.I. built to help US Judicial system.
Taser invests in A.I. and body-cameras to “anticipate criminal activity”.
01:34:30 Questions you should ask yourself when you work on A.I.
You have options !
I just finished the final version of ML1 Complete Video Collections, from Lesson 1 to 12.
Most of the lessons are pretty straightforward but I would personally suggest to all students, beginners as well as advanced, to pay special attention at the 2nd part of Lesson 12: @Jeremy brings up some serious issues about “A.I. and Ethics”, with real-life examples -and unfortunate consequences for some practitioners involved-.
This goes beyond what most A.I. classes cover.
These videos are gold! Thank you so much @jeremy been taking a lot of MOOCs about machine/deep learning. Even the https://www.deeplearning.ai/ classes, but with these videos I have to say I love this style of teaching. You ask questions to a room full of students and even if one person answers you in a slightly incorrect way you try to get more information out of them, hearing everyones response clearly and you responding to them to the best of your ability is by far one of the better methods to understanding the concepts being presented. More MOOCs should take note of this.
Just getting in to the first video… I’m not able to unzip train.zip on google colab, I get this error ( I believe I got the same error trying to do the exact same thing on Paperspace a few weeks ago) :
Archive: train.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of train.zip or
train.zip.zip, and cannot find train.zip.ZIP, period.
I also tried just downloading the train.zip file to my local machine, unzipped it, and then not sure how to use something like scp (what is my username & pw at google colab??) Any help is appreciated.
There isn’t a separate notebook for every lesson. Jeremy instead works through these 5 notebooks and discusses a lot more topics throughout the 12 lessons. Hope that helps.
Thanks… I dont know where the kaggle competitions command above put the files but then I looked at the api docs and found how to tell it explicitly to put the goods in the ~/data/bulldozers directory.
My understanding of l2 regularization is that it penalizes weights that are far from 0. We are adding a term to the loss function which increases squared with the weight value. So this will force the minimization of the loss function to look for weights that are closer to 0.
I have one question, from the MOOC ML course, and really appreciate your help. In Jeremy’s notebook - he uses split_val function to create the validation set (which is the last 12k rows from the data). That being set however, when we are doing set_rf_examples(5000) , it takes random samples from the entire training set for building individual trees for the RF. Isn’t is possible (though the chances are less likely), that some of the samples that went into the validation set may also get picked while building some of the trees in the RF.
Isn’t the general rule of thumb that the train and validation sample kept separate ?. here is the sample code I am referring to -
Really appreciate your help. If this is answered in other section or other following sessions, I apologize and can read / listen to it, Thanks for your help.