Another treat! Early access to Intro To Machine Learning videos

(Zach Landes) #579

No problem. Another idea would be to add features. You can get feature ideas from here:

For example, temperature, holidays, google trends.


Hi All, I am a backend coder by profession. Looking at the advance in AI and ML, I am really intrigued by it. Someone suggested me to look at Fast.AI lessons to get started. Now looking at the courses I found this and “Deep Learning for Coders”. What will be the best way to start? Should I go for this set of lesson or DL1 instead?


(Dan Goldner) #581

You can do one or both, in either order or simultaneously – but if you are brand new to machine learning in general and deep learning in particular, my personal recommendation is to take the ML course first, as it provides some basic grounding in training vs. validation and other universal concepts.


Hi All,
In lesson 3, for the grocery competition, jeremy turns the sales into:

df_all.unit_sales = np.log1p(np.clip(df_all.unit_sales, 0, None))

The np.clip is supposed to remove negative sales and consider them as zero as per the competition. On checking the competition data description, it say:

Negative values of unit_sales represent returns of that particular item.

But it doesn’t ask us to change the negative sale values to zeroes.

And wouldn’t doing this change our prediction too?


(kelvin chan) #583

When trying to draw the decision tree:

draw_tree(m.estimators_[0], df_trn)

I got an error: CalledProcessError: Command ‘[‘dot’, ‘-Tsvg’]’ returned non-zero exit status 1

I have brew install graphviz to get the latest and my “dot” is:

dot - graphviz version 2.40.1 (20161225.0304)
Anyone know what I can do to solve this?


(Jeremy Demlow) #585


Have you seen a great article on this topic as i understand why we want to up-sample the class, but I haven’t seen it done on a problem where i can follow along and get more than just the theoretical ideas.

This problem happens so much that becoming a practitioner on this subject would be worth while I believe.

Thanks in advance

(Matthew Krehbiel) #586

Does anyone know in which lesson (if any) that Jeremy goes over the bulldozer_dl notebook? Thanks!

(Jeremy Howard (Admin)) #587

I don’t think I covered it IIRC

(Nandha Kumar Murugesan) #588

Can you share the Jupyter Notebook of these videos (Intro to ML).
I tried searching but i could not find it in GitHub as well as in other places.
Will be much helpful in viewing it and for practice.
Thank you @jeremy

(Mike) #589

Did you find the answer for this?
I think I might be having the same issue, I’m trying to apply what I have learnt in the Random Forest videos to the Kaggle House price competitin and am finding that after running proc_df on the test set that I have more columns than I did in my training and validation sets.
Is that the problem you had too? Did you find a solution?

(Kiran) #590

@jeremy: Great fastai library. One question: Did you write all the code on your own or did someone else also contribute?
Given your genius in being Kaggle #1 for such a long time, it is not impossible to assume you wrote it all on your own


In lesson 10, we are introduced to Naive Bayes algorithm for bag-of-words sentiment analysis model. When computing r for a particular document we take the (log) ratio of the probabilities of each word in the document appearing in each class. That all makes sense.

But how come we don’t also include in both the numerator and denominator of r the probability of all words in our vocabulary that didn’t appear in that particular document. For example, if the document we’re trying to classify is the statement “this movie sucks” and our entire vocabulary is only composed of the following five words {“movie”, “this”, “good”, “is”, “sucks”}, shouldn’t we include the probability that this document does not contain the words “good” and “is” as opposed to only including the probability that the documents contains the words “move”, “this”, and “sucks”?


(Jonas G F Pettersson) #592

Have you looked here:

(Jeremy Howard (Admin)) #593

IIRC I wrote nearly it all of it on my own. There have been some PRs since the first course contributing some new features however, which have been much appreciated!

(Kiran) #594

Hi Jeremy
How long did it take for you to write the code? In terms of man hours to create the initial version. I would like to understand how far I am from the gold standard!

Also it is amazing to see how well you program making things simple with few lines of code in a language new to you like Python. You are definitely one of the best minds in the world

(Jonathan) #595

Hey Brad, I’m having the same issue.

How exactly did you change the file?


(Aseem Bansal) #596

@jeremy Would it be possible to edit the 1st post and add “this course is not on website yet” or something similar. I am trying to get some new people to go through this course. But in the first video you say “watch the course on the website instead of youtube”. People go to the website and then get confused by the deep learning course which is the only one on the website. I clarify as soon as I find that out. But it would be helpful to have something like that in the post itself.


And of course do not share these videos with anyone outside the course. My ability to share things like this depends on everyone being careful about the privacy of these pre-release materials.

they are still pre-released.

(Aseem Bansal) #598

This is now a public forum. When Jeremy posted this thread originally Part 1 International fellowship was going on. So I don’t think so. They have not been placed on the website yet. Probably Jeremy was busy. But that’s my guess.


I’m considering working through this before deep learning for coders. Is there any benefit to waiting until these are officially released and on the website (additional resources that will be released then, etc)? And is there any timeline set for publishing these on the website?

This looks great though- was looking for something like this. Thanks for working on it.