Another treat! Early access to Intro To Machine Learning videos


(sashank) #726

can anyone help me on this ?


(Kevin Bird) #727

I am listening to lesson 5 and I am not sure I understand the extrapolation section. So you try to predict your validation set records (in my case, I have a holdout set) Then you take the feature importance of those and try to drop each of them and run the model like that. At that point I would expect you to keep the columns that would make the score worse if it weren’t in the model and drop anything that makes it better, but Age which when dropped doesn’t hurt but you still keep it in. Why is this not also dropped? I have tried implementing this in a real world scenario and I am not getting any of my columns that are making the model better when they shouldn’t, but when I predict a previous month and remove all the data after that point, I get fairly decent results, but when I try to predict the following month, I am not getting as good of results. I suspect data leakage of some sort, but I haven’t tracked it down yet.


(Kevin Bird) #728

I have used it and from what I can tell it does a pretty good job. I don’t think it is directly interpreting the model though so it is taking a simplified version of the model to make it’s assumptions I think. So sometimes the Feature importance list from the model will be different from the SHAP library. Overall I definitely think it has a lot of potential. I’ve been using it with XGBoost with pretty decent success.


(sashank) #729

Yes the issue is with creating the feather file . Not sure of internal issue but i stopped creating feather file and it solved the issued for me


(SA) #730

i do not have any ML experience. Should I watch these ML lectures or do DL1 & 2. Is this ML course complete? like are all the materials on the jupyter notebook or is it in a state of progress? I saw there are only 5 notebooks in the ML course repository.
Also, i have almost completed the first lecture. should i be knowing all the details of scikit,pandas etc.?
I have done a bit of matplotlib. My problem with libraries is that I keep forgetting the module/library specific commands bcoz there are so many of them. Similarly, in the lecture, there are a lot of attributes,dot notations, but how does one remember all that. Documentation is there but how will I use features if I don’t know/remember them at the right time?
Also, after watching 1 lecture how much time should I dedicate to go through the notebook/kaggle datasets before watching other lectures?

I am thinking of going through this MOOC to gain better understanding of pandas,scikit ,data cleaning etc. (https://courses.edx.org/courses/course-v1:UCSanDiegoX+DSE200x+1T2018/course/)
Is that needed or should I just go through the notebook?


#731

I’m using the paperspace fastai server template, but for lesson 1 the data folder isn’t there. And I haven’t been able to get the data downloaded using the methods explained in the videos (I can’t get the correct link).
If someone can help me out with the correct link to use, or an alternate way of getting the data there it would be really helpful!


(Sumit) #732

Hey @spock,

First of all, don’t worry just start doesn’t matter whether you know everything or know nothing.

Here I’ll quote as @jeremy & @rachel said learn things on as needed basis, don't try and learn everything that you might need first otherwise you'll never get around learning the stuff you actually want to learn

Although I had a little bit of ML exp, still I went through ML videos first and it helped me a lot.

Yes, it is.

Don’t worry most of us face the same issue. But in time you’ll create your own style of recalling/finding of whatever you required.

One more thing, I believe in the community here, I and others are here to help.

Cheers !!
Sumit


(SA) #733

Thanks. The reason i was along whether ML course is complete or nott is because it had only 5 Notebooks.
Also, is doing the ML part necessary or can I skip to DL 1,2 because before ML was launched people were doing DL 1,2 in the beginning by default, right? I guess doing ML part provides better foundation for DL 1,2?
Also, how much time should I dedicate to each lecture,and apart from reading the notebooks , experimentation what else are we supposed to do? When should i assume that my 1st lecture material is complete ?


(Ali) #734

Thank you for the link


(Kyle Nesgood) #735

Just throwing in my two cents, but I don’t consider this a “must-do” course. As an example, I listened to all of the DL course 1 lectures straight through before going back and dissecting each lesson. When I went back, I started with whatever lesson I was interested in, listened for a while / took flashcards, then used it to refine something I had been working on (a Kaggle competition, solo research, etc.).

Each of us will have a unique style that teaches ourselves “the best”. Just jump in and enjoy the ride - don’t become the roadblock to your own enjoyment of the material!


(SA) #736

Thanks


(Sumit) #737

Hi,

Can anyone help me to resolve this issue.

Thanks !!


(Gerardo Garcia) #738

Are you in the fastai folder?
Did you activate fastai?

Follow the steps here
FASTAI


(Sumit) #740

Hi @gerardo,

I’m running it in my local computer. Yes, I have followed the steps which are mentioned.
But still i’m facing the same issue.

Thanks,
Sumit


(Gerardo Garcia) #741

Do you have NVIDIA cards in your computer?
or are you using CPU only.

On the previous post there’s explanation of the CPU or GPU installation.


(Sumit) #742

No.

Yes.

I’m trying to run lesson4-mnist_sgd (ML), I think it doesn’t require GPU?
Please correct me if I’m wrong.


(Sumit) #743

@gerardo,

Please Let me know if i’m doing it correct or not.

Initially When i started the course i have cloned the repo and i run jupyter notebook after which i used to run the cells and do the exp in notebook. But i never used activate cmd

Thanks,
Sumit


(Gerardo Garcia) #744

From fast.ai github repository

CPU only environment
Use this if you do not have an NVidia GPU. Note you are encouraged to use Paperspace to access a GPU in the cloud by following this guide.
conda env update -f environment-cpu.yml
Anytime the instructions say to activate the Python environment, run conda activate fastai-cpu or source activate fastai-cpu.


(Sumit) #745

Thanks, @gerardo for all the help :smiley:


(Dien Hoa TRUONG) #746

In ML lesson 4 [6:30]. Jeremy indicated that when decreasing the set_rf_samples number, we are actually decreasing the power of the estimator and increasing the correlation. But I think the correlation should decrease right ? Because in this case, we are less likely to chose the same row for each individual tree.

Actually, he put a decreasing arrow but said increasing so I’m quite confused.