(Jeremy Demlow) #585

ecdrid,

Have you seen a great article on this topic as i understand why we want to up-sample the class, but I haven’t seen it done on a problem where i can follow along and get more than just the theoretical ideas.

This problem happens so much that becoming a practitioner on this subject would be worth while I believe.

(Matthew Krehbiel) #586

Does anyone know in which lesson (if any) that Jeremy goes over the bulldozer_dl notebook? Thanks!

(Jeremy Howard) #587

I don’t think I covered it IIRC

(Nandha Kumar Murugesan) #588

@jeremy
Can you share the Jupyter Notebook of these videos (Intro to ML).
I tried searching but i could not find it in GitHub as well as in other places.
Will be much helpful in viewing it and for practice.
Thank you @jeremy

(Mike) #589

Did you find the answer for this?
I think I might be having the same issue, I’m trying to apply what I have learnt in the Random Forest videos to the Kaggle House price competitin and am finding that after running proc_df on the test set that I have more columns than I did in my training and validation sets.
Is that the problem you had too? Did you find a solution?

(Kiran) #590

@jeremy: Great fastai library. One question: Did you write all the code on your own or did someone else also contribute?
Given your genius in being Kaggle #1 for such a long time, it is not impossible to assume you wrote it all on your own

#591

In lesson 10, we are introduced to Naive Bayes algorithm for bag-of-words sentiment analysis model. When computing `r` for a particular document we take the (log) ratio of the probabilities of each word in the document appearing in each class. That all makes sense.

But how come we don’t also include in both the numerator and denominator of `r` the probability of all words in our vocabulary that didn’t appear in that particular document. For example, if the document we’re trying to classify is the statement “this movie sucks” and our entire vocabulary is only composed of the following five words {“movie”, “this”, “good”, “is”, “sucks”}, shouldn’t we include the probability that this document does not contain the words “good” and “is” as opposed to only including the probability that the documents contains the words “move”, “this”, and “sucks”?

Thanks

Have you looked here:

(Jeremy Howard) #593

IIRC I wrote nearly it all of it on my own. There have been some PRs since the first course contributing some new features however, which have been much appreciated!

(Kiran) #594

Hi Jeremy
How long did it take for you to write the code? In terms of man hours to create the initial version. I would like to understand how far I am from the gold standard!

Also it is amazing to see how well you program making things simple with few lines of code in a language new to you like Python. You are definitely one of the best minds in the world

(Jonathan) #595

Hey Brad, I’m having the same issue.

How exactly did you change the text.py file?

Thanks

(Aseem Bansal) #596

@jeremy Would it be possible to edit the 1st post and add “this course is not on website yet” or something similar. I am trying to get some new people to go through this course. But in the first video you say “watch the course on the website instead of youtube”. People go to the website and then get confused by the deep learning course which is the only one on the website. I clarify as soon as I find that out. But it would be helpful to have something like that in the post itself.

#597

And of course do not share these videos with anyone outside the course. My ability to share things like this depends on everyone being careful about the privacy of these pre-release materials.

they are still pre-released.

(Aseem Bansal) #598

This is now a public forum. When Jeremy posted this thread originally Part 1 International fellowship was going on. So I don’t think so. They have not been placed on the website yet. Probably Jeremy was busy. But that’s my guess.

#599

I’m considering working through this before deep learning for coders. Is there any benefit to waiting until these are officially released and on the website (additional resources that will be released then, etc)? And is there any timeline set for publishing these on the website?

This looks great though- was looking for something like this. Thanks for working on it.

(Amar Sharma) #600

Thanks @jeremy! for creating these amazing courses.
I want to ask if should I take this course if I’ve already done the ML courses from coursera. Does this course has extra tricks/tips?
Thanks.

(Jeremy Howard) #601

No particular benefit, other than a larger community and a dedicated forum when it’s released. Should be within a month (maybe more like 2 weeks).

(Kevin Bird) #602

Awesome!

#603

Ah, great. I will probably wait then as it isn’t too long. Need a bit of time to familiarize myself with Python data science libraries anyways as I’m coming from another programming language. Thanks!

#604

Thank you.