Wiki/lesson thread: Lesson 2


(Luis Ortega) #25

I’m using windows 10. I had this issue even after installing Graphviz. The problem was that the Path did not have the graphviz folder where dot.exe resides.

I did a search in windows explorer to locate dot.exe and added the folder to the system PATH. I had to restart the machine to have Jupyter kernel use the new path.

HTH


#26

Thank you all for your replies!

I run Linux, so the windows 10 solution does not apply to me in this case.

It turns out I had not activated the fastai environment:

source activate fastai

(Xoel López) #27

This is my first message, first of all thanks for all this great content, @jeremy!!

I found very interesting your idea of creating every tree of the RF with a subsample of the original training data. I tried that approach using set_rf_samples and it made sense that it should take more or less the same time as training the RF with a subset of the data as you said, but it didn’t. I submitted an issue on github about this.

I saw that also in your case the same thing happens, taking 539 ms when you train on a subsample of the data and 3.49 s when you use set_rf_samples. Why does this happen?

Thanks!!


(Mayank Khanduja) #28

I am running on a kaggle kernel and getting an error “No module named fastai.structured” after running fastai.structured import *


(Christian Baumberger) #29

Regarding proc_df(): When I look to the source code of proc_df, it looks to me that the data is randomly selected and not the first N rows are choosen. So therefore this set will overlap with the validation set in the provided jupyter notebook, right?
Second: I think I remember you said that set_rf_sampes cannot be used in combination with oob_scores=True. But in the provided notebook it is used in that way!?!