Wiki thread: lesson 1

I wrote a document on How to Install Fastai v0.7 on AWS for Fastai Machine Learning Course that details every installation step to get you started on the Fastai ML course using AWS. It took me several hours to get the installation done and don’t want others to spend the same time looking at all the fixes.

1 Like

Thanks Jo- Is this for the ML course or deep learning?

Its for version 3 of the deep learning classes, both of which use fastai v1.

Hello. Scikit learn runs only on CPU, so it is ok to utilize only the CPU.

I am just starting the ML course lesson 1.
But not able to donwload FastAi in my system.

Can you please tell me how should i proceed?

I have just started with 1st lesson of ML on fastai.
I am facing difficulty with downloading Fastai library .
Can you please guide wether i should use 3rd party source for this lesson or there is some way i can download this Fastai library to my local machine.

please guide me through the steps.

@jeremy @rachel

Hi,

First of all - thanks so much for doing this, I have learned a lot just in the first lesson.

With Blue Book for Bulldozers I am getting a log RMSE of 0.22-0.25 using the split_vals on the train dataset as per the lecture.

BUT

I decided to try the models on the Kaggle Valid and ValidSolution as test set and y_true, and it really scored poorly (around 0.43-0.50) – nowhere near the leaderboard results!!

I would appreciate any insight in why this is, and how one can optimise further?

  • What are the approaches to try?
  • Is it just the best a RF can achieve for this problem, or am I doing something wrong?

My notebook and notes are here
https://colab.research.google.com/drive/1qRTWrsonAlwUggshDwd9o5oQMNUJUlap

Try to run !pip install fastai==0.7

Knowing the course wants 3.6 is great. I made several conda environments to see how errors changed. I saw that some people were downgrading their torchvision? package to have fewer errors, but I think that must be a bad idea. I added
import warnings
warnings.filterwarnings(‘ignore’)
to the top of the jupyter notebook, and everything looks much less confusing now. BTW, the only way I’ve been able to get tensorflow running on my GPU is using Lambda Stack’s script. Cheers, and thank you.

Hi,
I’m from 2019, so maybe this thread is long dead, but I started doing the course. Is this the right place to post questions about the “homework” where Jeremy suggested we try some kaggle competitions on our own using the techniques from lesson 1?

If so, well I went and tried the https://www.kaggle.com/c/house-prices-advanced-regression-techniques/ (house prices) and fell down a little. The test.csv data contains a bunch of columns that aren’t in train.csv (afaict that is the issue when I get this error

ValueError: Number of features of the model must match the input. Model n_features is 83 and input n_features is 91

Running list(set(test_df.columns) - set(df.columns)) I see

['BsmtFinSF1_na',
 'GarageCars_na',
 'BsmtHalfBath_na',
 'TotalBsmtSF_na',
 'BsmtUnfSF_na',
 'BsmtFinSF2_na',
 'GarageArea_na',
 'BsmtFullBath_na']

Is the right thing to do here just remove those columns from the test data set?

Many thanks in advance (FWIW that is what I am going to do, just so I can submit to kaggle, then move on to lesson 2)

UPDATE: it’s weirder than I thought. The raw dataframes have the same number of columns, but after running proc_df the test dataset has 91 cols, vs the train set that has 83, the difference is all those extra na columns listed above. I’m not sure what to do about it?

UPDATE: I just dropped those columns, and I managed to submit to kaggle. OK, so, 3174 out of 4845 is not great (certainly not top 25% as was suggested in the lesson) but given I really have no idea what I am doing, I at least have a score to improve on. Thanks for the lesson, on to the next one!

Cheers
Russell

One more questions: At the end of the lesson 1 lecture, there is still a large chunk of the lesson 1 notebook that is not covered. Do I do this stuff on my own before lecture 2, or is it covered in lecture 2?

It’s in lecture two

Hi, I just started this course on the weekend.

Does anyone have a suggestion for which Kaggle competitions might be good to start with, especially for this first part of trying out the technique from lesson 1?

Also if anyone else is just starting, or has started recently and would like to discuss as we go, please let me know.

Thanks.

Has anyone tried opening ML for coders notebooks in Google Collab environment ? Does it work ?

1 Like

I don’t get it. Can I edit others’ posts here? I don’t see the pencil the OP mentioned

You can’t edit other people’s posts on this platform, only your own. Or if the author made it a wiki page.

Oh ama stupid

I got error when importing fastai.structured modules. I was able to resolve it, not sure how this impacts rest of the code base.
Error was on importing Imputer package from sklearn.

Checked scikit-learn GitHub page and got to know that Imputer is depreciated and replaced with SimpleImpute. It is also imported from a different module.

I made couple of changes to structured.py
Commented : from sklearn.preprocessing import LabelEncoder, Imputer, StandardScaler
Added:
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.impute import SimpleImputer

1 Like

Hi, I just now started this machine learning course.
I the below code to update fastai libraries:
!curl -s https://course.fast.ai/setup/colab | bash

and when i tried to import libraries, It shows me the following errors:

No module named fastai.structured
No module named ‘pandas_summary’

What should I do?
@tech-novic

thank you , you saved my day (week maybe)