About the Intro to Machine Learning category


#23

I get a really low oob score of 0.4716 and an rmse of 0.027 on the red wine quality dataset. Why is my oob_score so low? Does it mean that I am overfitting?

I have trouble knowing how to handle small sets of data. This one has 1599 records only. As soon as I create a validation set of size 30, my r^2 drops from 0.90 to 0.11! So I don’t use a validation set, instead I use oob_score_. Now r^2 is 0.90 but the oob_score_ is 0.47.

This is my model without a validation set:

m = RandomForestRegressor(n_estimators=80, max_features=0.5, n_jobs=-1, oob_score=True)
m.fit(df_trn, y)
print_score(m)

This is my model with validation set:

m = RandomForestRegressor(n_estimators=80, max_features=0.5, n_jobs=-1, oob_score=True)
m.fit(X_train, y_train)
print_score(m)
[0.0393104889825399, 0.08282278268007248, 0.9278235796906971, 0.11669103577122453, 0.4711981193537419]

Could someone tell me how to interpret those big differences in result and tell me what errors I do wrong?


(Ashirwad Sangwan) #24

Convert the training and test data in the same way if you don’t want the error to appear. So whenever you’re encoding the categorical variables apply the same method on both training and test data.


#25

Thank you, I finally managed to make it work.


(Anne Estoppey) #26

Hello everybody

Edit: I finally found the thread which tells you how to (re)install fastai 0.7 and the dependencies from Colab!
Seems to work :grinning:


#27

I’m using Google Colab too.

do I need to run

!pip install fastai==0.7.0

every time I start a new notebook/chapter? Thanks


(Anne Estoppey) #28

Yes I think so. I have to do this too. Also when I take the same notebook again the next day.

Well while your run the fastai ‘downgrade’ in colab, you can get another coffee in the meantime…

IMO it is worth it though, it’s really excellent :smiley:


#29

@Anne
BTW, did encounter crashes when you’re doing the Lesson 3 grocery store?

df_all.unit_sales = np.log1p(np.clip(df_all.unit_sales, 0, None)) 
add_datepart(df_all, 'date')

(Anne Estoppey) #30

Hello Andrew, in which notebook is it, and whereabout in the notebook?

Cheers from Norway :grinning:


#31

@Anne, there’s no notebook for this it was discussed at the first part of Lesson 3 ( Grocery). https://youtu.be/YSFG_W8JxBo?t=1590