Thanks Jeremy, I think I misunderstood, so does it mean the set_rf_sampels following these lines of code (which does the split_vals to create train and valid ) only does sampling from the X_train data set , is that correct , if thats the case it makes sense to me.
I’m almost done with the DL course and was wondering what benefits there are to learning typical machine learning. Are there many types of datasets/problems where classic machine learning will out perform deep learning? If so, what are some examples? Basically, I’m trying to decide if I should take this course after I complete the DL one, or just continue studying DL.
What are your thoughts on reinforcement learning? Do you have much experience with it? Any plans on teaching a course about it?
Our method uses a deep convolutional network trained
to directly optimize the embedding itself, rather than an intermediate
bottleneck layer as in previous deep learning
approaches
Once this embedding has been produced, then the aforementioned
tasks become straight-forward: face verification
simply involves thresholding the distance between the
two embeddings; recognition becomes a k-NN classification
problem; and clustering can be achieved using off-theshelf
techniques such as k-means or agglomerative clustering.
Here DL and ML is used together serving different purposes to produce a working solution
You should definitely take this course next - nearly all the concepts are directly applicable to what you’ve learnt and will make you a better DL practitioners.
There’s still a lot of doubt about whether RL is actually doing anything useful. Random search is nearly just as good for many of the things it’s been used for. So I’m holding off teaching anything about this until we have some genuine best practices to teach.
@parrt and friends have just written a new article on feature importance in random forests. Would love to get your feedback on this draft - let us know if anything is unclear, you spot any mistakes, etc.
Please don’t share on social media yet, until we’ve fixed up any little issues!
@jeremy Hello Jeremy, I have a question regarding the fast.ai library.
So after I used train_cats into a dataframe df, such as:
train_cats(df_raw)
I basically substitute the categorical variables by numerical variables.
As far as I understood, the previous categories contained in the dataframe are replaced in place but, if I want to retrieve them later, more precisely to substitute them back into the dataframe, there is no easy way of doing that, since I have no explicit mapping between numerical and categorical values.
Could you give a few pointers on how this can be achieved?
Thanks and congratulations on the excellent course!
They’re still there - take a look at the data frame and you’ll see them! (We do this in the lesson, in fact).
We do discuss this in some detail in the video, so maybe try re-watching them and see if you can answer your question - if so, come back here and let us know what you find. If not, tell us what you can about your understanding, and we’ll try to fill in the gaps.
@jeremy Thanks Jeremy, indeed the categories remain present in the dataframe after using train_cats.
I still have trouble when I use the function proc_df, like this:
So my question here is: can I somehow get a mapping between the categorical values 1, 2 and 3 back to the original “pet” values (cat, dog, fish, respectively)?
In Lesson 1(Machine Learning) in the part where @jeremy is dealing with the strings in the data (https://youtu.be/CzdWqFTmn0Y?t=3535) (“low, medium, high” etc.) . I am trying the same process to convert strings to category by using dataframe.cat.set_categories() my entire data changes to Nan . And on further changing it to dataframe.cat.categories all the values changes to -1. I have attached the kaggle link to data train.csv and i am trying to change the ‘GarageType’ column to category.
I am new to python and ML so pardon my ignorance. I was looking at the question raised during lesson 1 about automatically parsing the data and finding dates.
Now when I look at the pandas.read_csv documentation there is a infer_datetime_format parameters. When I try it using:
@jeremy Hi Jeremy I had a question on the bagging section of the RF notebook. My understanding is that we need to pass the n_estimator for the number of trees. So how does:
m = RandomForestRegressor(n_jobs=-1)
m.fit(X_train, y_train)
print_score(m)
result in exactly 10 trees later in the example?
Edit: My bad. Looked at the documentation. 10 is default.