This is now a public forum. When Jeremy posted this thread originally Part 1 International fellowship was going on. So I don’t think so. They have not been placed on the website yet. Probably Jeremy was busy. But that’s my guess.
I’m considering working through this before deep learning for coders. Is there any benefit to waiting until these are officially released and on the website (additional resources that will be released then, etc)? And is there any timeline set for publishing these on the website?
This looks great though- was looking for something like this. Thanks for working on it.
Thanks @jeremy! for creating these amazing courses.
I want to ask if should I take this course if I’ve already done the ML courses from coursera. Does this course has extra tricks/tips?
No particular benefit, other than a larger community and a dedicated forum when it’s released. Should be within a month (maybe more like 2 weeks).
Ah, great. I will probably wait then as it isn’t too long. Need a bit of time to familiarize myself with Python data science libraries anyways as I’m coming from another programming language. Thanks!
In Machine Learning 1 Lesson 5 at 36.30 Jeremy is taking about loosing the temporal charectristic of the data if we choose the validation set randomly but at the same time he is insisting on a point that the temporal behaviour can be taken care by sorting the validation set after randomply sampling it . i am not able to digest the idea.
Yes, I find it has many gems (i.e. tricks of doing things faster and/or better) which Jeremy has personally collected over 25 years of ML practices and you could not find them in textbooks or elsewhere.
At 30:35 of lesson 2, Jeremy gets a random sample of 30k rows. He then says the validation set should not change, and that the training set should not overlap with the dates (not sure which dates he is referring to).
The original validation set is made up of the last 12k rows. Since
proc_df is run on a subset of a random 30k rows, isn’t is possible that some of the new, smaller training data consists of rows from the validation set? Furthermore, I would think that the smaller training set is not necessarily ordered by date any longer since rows were picked at random.
edit: I checked out the source code, and
get_sample returns the data in sorted order so that addresses that question. I still think it’s possible that the training data could overlap with the original validation set.
Change in line 15 of text.py:
texts.append(open(fname, 'r', encoding='utf-8').read())
@jeremy I am confused whether to do Machine learning course or deep learning course first here?? which do you think will be better to do first??
They are different. If you don’t have any experience with dataset manipulation, cleaning and validation set creation, do the ML1 course first because that knowledge is assumed in the DL1 course. Personally I felt like it worked well for me to do ML1 followed by DL1. Also, fyi Jeremy has requested not to be personally tagged in posts unless he is the only person who can answer the question.
Thanks for loading the rest of the ML Class videos - they are really great. Will the notebooks for the ML lessons 6-12 be released on github?
One silly question. I’ve just completed the 1st part of ML course. I’m really compelled to ask when the 2nd part is going to be available even in a non-official way? Thanks so much for all these courses
There’s no 2nd part of the MOOC - just a 2nd part for masters students at USF.
Oh, I was really looking forward to it:sweat_smile: . Thank you so much for all the courses. I’m done with ML and DL1 and they’re one the best courses of ML I’ve ever taken
There is a small inconsistency/bug in both:
df_raw = pd.read_feather('tmp/raw')
should be replaced by:
df_raw = pd.read_feather('tmp/bulldozers-raw')
since this is what ml1/lesson1-rf.ipynb used to save the data. or alternatively the first notebook should save its data as ‘tmp/raw’.
update: two more notebooks have the same issue:
so probably should just fix the first notebook (ml1/lesson1-rf.ipynb) to save data as ‘tmp/raw’ instead of changing 4 notebooks. on the other hand ‘tmp/raw’ could collide with another lesson that may use tmp/raw.
Hi, @yinterian. Did you publish the Jupyter notebooks for the other lectures given at USF? If so, could you please tell us where? Any learning materials are much appreciated!
I have a question regarding TreeInterpreter. Jeremy explained how it works for a single tree in the course but I don’t quite see how the result that is output by TreeInterpreter is calculated when we have as an input model a random forest.
A second question is that there is a boolean input called joint_contribution. Is it like a way to allow the study of features interaction?
The third is: Let say that we have a small dataset. If we do cross-validation to choose a model and tweak its parameters, once we’ve obtained the model, can we just in order to do all this interpretation thing split the dataset that served for cross-validation into a training and a validation set and do all this interpretation thing by retraining a new model with the same parameters as the one obtained with cross-validation on the new training set and do all the interpretation part on the new validation set?
The fourth and final question: In the course, it is said that there is some article that shows that the best way to deal with an unbalanced dataset is to copy-paste the class with the lowest number of samples. Let say we have a couple of classes and we have to duplicate one class rows three time to get a balanced dataset. Doesn’t that make our model more prone to overfitting at least to the duplicated class rows? Can you please give me the title of this article?