I built really simple model using fastai to just show another person how easy it is to build models using fastai but it was embarrassing to realize that the model didn’t worked with house price dataset that I tested. I don’t have test set but validation set accuracy stays below 10%. Then I tested with titanic dataset and it gave around 80% accuracy without any modifications. My personal doubt is that NaN values in house price dataset causes some problems but not sure so can someone review the code below to check if I forgot some important part. And if those NaNs are causing errors is there some easy proc that I can use or do I need to write my own code?
from fastai.tabular import * import pandas as pd target_column = 'SalePrice' df = pd.read_csv('/path/train.csv') df = df.sample(frac=1).reset_index(drop=True) valid_idx = range(0,250)#range(0,min(10000,max(int(len(df)*0.05),64))) _,cat_names = cont_cat_split(df,dep_var=target_column) procs = [FillMissing, Categorify, Normalize] data = TabularDataBunch.from_df('.', df, target_column, valid_idx=valid_idx, procs=procs, cat_names=cat_names) learn = tabular_learner(data, layers=[200,200,100], metrics=accuracy) learn.fit_one_cycle(5, 1e-2)