Wiki thread: lesson 1

(Jonas) #88

Question about proc_df

Hi everyone,
I’ve got a question about how you would handle it if in your train and test data set different columns have NaN in them or not. I used the Kaggle data from the housing price competition. In the training data column A has no NaN inside and column B also not but in the test data column B has a few NaN inside therefore proc_df creates the column B_na. Now the test data set has one column more and can’t be used.

To make it work I just dropped all the feature_na columns proc_df created in the test and training data set. What better way would there be? Create a _na column for every column with only false inside if no value is NaN?

Thanks for your help,
Jonas

0 Likes

(Jackson Isaac) #89

You can add a folder (e.g., custom) within fastai module directory. Your directory might look something like -

fastai/
– tabular
– utils
– …
– custom

You may add user defined modules under custom folder. git pull won’t overwrite/revert your changes. If you want to have git tracking for custom files as well, you may want to fork the fastai repo and push your custom code to the fork.

In your code, you would import the function as -
from fastai.custom import function_name

You can add the module to existing folder as well, but to keep user code separated from upstream it might be easier to have it under a different folder.

0 Likes

(Abhinav Verma) #90

Yeah,sometimes the server is down . It’s very frequent in kaggle

0 Likes

(Dennis) #91

Hi,
as suggested in the video I set up an account with crestle.
However the course data was not in there.
Then I followed the setup instructions.

Now I am getting the following error:
ModuleNotFoundError: No module named ‘cv2’ when importing fastai.imports

What do I need to do?

Thanks!

0 Likes

(Edward Easling) #92

I had a similar issue. I was able to get it working by installing all of the packages listed in this blog post

1 Like

(Dennis) #93

Thank you so much! This blog post should be pinned at the beginning of this page.

0 Likes

#94

Hi,
I am a bit stuck after setting up fastai on ubuntu (windows subsystem for linux).
I tried to run the code from the jupyter notebook lesson-rf1, but it gets stuck at the second cell. It throws this error:

Any help would be appreciated. :slight_smile:

Edit: Found a solution here.

1 Like

(Jeroen van Vliet) #95

I have deployed the DSVM image (Linux version) in Azure and I’m able to connect to the machine via a SSH session. However when I browse to the machine remotely and succesfull login I receive an error message:
500 : Internal Server Error.

Your help is much appriciated.

KR, Jeroen

0 Likes

(Ruben) #96

On the 3rd video / lesson he explains that you will use the dictionary values stored on nas to “process” the Test Set in the same way you did to the Training Set, in order to produce the predictions that you want to submit.

train_set= f ( train_csv )
the nas is somehow similar (not equal) to that f(x)
so then you do
val_set= f ( val_csv )

0 Likes

(Akshai Rajendran) #97

To add on to what @jacksonisaac mentioned, we have to fill the missing value with something. And by filling it with the median we preserve the overall median of the data, with likely a minimal impact on mean and standard deviation. Filling with other values will have a different impact on the distribution of data.

0 Likes

#98

proc_df() is converting DataFrame input into List

I am trying to predict output for the test data.

type(test_df) # o/p: pandas.core.frame.DataFrame
# test_df is a DataFrame numeric values along with missing values
test_df = proc_df(test_df)
# test_df is now a list
type(test_df) # o/p: list

test_df is getting converted to list. I have no idea why.
Any help? Thanks :slight_smile:

0 Likes

(Sfundo Mhlungu) #99

hello everyone can some please help, i am currently doing the intro to machine learning course, part 2 specifically, and i decided to take a leap of faith and analysed a random data set, and i got these scores, but because i’ve never seen them it’s really hard to say what they mean, i am use to less scores, eg 0.025

but this:
a) rmse of training set, b) rmse of validation, c) score of train, d) score of valid
[2.6473913845249957, 5.991807398439978, 0.9940291743929885, 0.8914779738266136]

i’ve tried everything, none is working,

if you want i can upload everything to github, for better analysis.

Thanks in advance

0 Likes

(Harry Chan) #100

In the course video, @jeremy said something about Machine Learning Driven EDA. I’ve finished lesson 4 and I still don’t really get that what means. Will he talk about it later in the course?

0 Likes

(Harry Chan) #101

It’d definitely help if you could share your data and code for the result!

0 Likes

(Harry Chan) #102

the first item of the list is the dataframe with independent variables and the second is the dependent variable if you passed the column name of it.

0 Likes

(Christina Seventikidou) #103

Hello! I try to find ml1 but there is not. I use google cloud and I follow this path tutorials/fastai/course-v3//nbs/dl1. There is no option ml1! Nore somewhere in tutorials!
I updated the course repo. Did anybody have the same problem or has an idea how could I fix it?
Thanks!

0 Likes