Hello everyone,
I just started the Intro to ML course, and am having problems with the bulldozers dataset used for the first ML problem.
Everytime I call the fit() method on the Random Forest, everything freezes.
Here’s what I’ve done so far:
- I had some problems getting fast.ai installed on my computer, so I just used straight pandas instead
- I don’t see the above issue being a problem, since from what I can tell fast.ai mostly repurposes functions from sklearn and pandas.
- I saved the csv file to feather, and reloaded it.
- I switched all columns w/ dtype ‘object’ to dtype ‘category’
- I manually extracted all the time parts from the ‘saledate’ column, similar to
add_datepart
- I filled all empty values for numeric columns with their median value
- used
pd.get_dummies(df, sparse=True)
to generate dummy variables. The resulting dataframe has about 7700 columns and a memory usage of about 50 mb. - used
train_test_split
from sklearn to generate a test set with 12000 rows.
I first tried this on my laptop, which has 12 GB of RAM, with about 9 GB free when the fit()
method was called using the Random Forest. I’ve tried this with both n_jobs=-1
and without calling it at all.
I also then tried uploading everything to Crestle, figuring that I was using up my available RAM, but this didn’t work either.
I created a new folder for this class in the home directory, uploaded the dataset, and the notebook terminated itself shortly after calling fit()
again.
I’m not sure if I did something incorrect in my processing, although it looks to me I performed the same operations on my dataframe as what happened in the original Jupyter notebook, even though I didn’t use the fastai library for it.
Does anyone know what I might be doing wrong?
Thank you.