I am trying to solve the Kaggle Bulldozer Bluebook challenge using fastai DL. I am using the fastai.tabular functions. Train.csv can be downloaded from https://www.kaggle.com/c/bluebook-for-bulldozers/data
My code is below
from fastai.tabular import *
import pandas as pd
train_df = pd.read_csv(path/‘Train.csv’, low_memory=False, parse_dates=[‘saledate’])
#Change SalePrice to log because the evaluation is for RMSLE
train_df.SalePrice = np.log(train_df.SalePrice)
#Change dates to date parts
add_datepart(train_df, ‘saledate’)
#Add a column for age of bulldozer
train_df[‘age’] = train_df[‘saleYear’] - train_df[‘YearMade’]
dep_var = ‘SalePrice’
cat_names = [‘SalesID’,‘MachineID’, ‘ModelID’, ‘datasource’, ‘auctioneerID’, ‘YearMade’, ‘UsageBand’, ‘fiModelDesc’, ‘fiBaseModel’, ‘fiSecondaryDesc’, ‘fiModelSeries’,
‘fiModelDescriptor’, ‘ProductSize’, ‘fiProductClassDesc’, ‘state’, ‘ProductGroup’, ‘ProductGroupDesc’, ‘Drive_System’, ‘Enclosure’, ‘Forks’, ‘Pad_Type’, ‘Ride_Control’, ‘Stick’, ‘Transmission’,
‘Turbocharged’, ‘Blade_Extension’, ‘Blade_Width’, ‘Enclosure_Type’, ‘Engine_Horsepower’, ‘Hydraulics’, ‘Pushblock’, ‘Ripper’, ‘Scarifier’, ‘Tip_Control’, ‘Tire_Size’, ‘Coupler’, ‘Coupler_System’,
‘Grouser_Tracks’, ‘Hydraulics_Flow’, ‘Track_Type’, ‘Undercarriage_Pad_Width’, ‘Stick_Length’, ‘Thumb’, ‘Pattern_Changer’, ‘Grouser_Type’, ‘Backhoe_Mounting’, ‘Blade_Type’, ‘Travel_Controls’,
‘Differential_Type’, ‘Steering_Controls’, ‘saleYear’, ‘saleMonth’, ‘saleWeek’, ‘saleDay’, ‘saleDayofweek’, ‘saleDayofyear’, ‘saleIs_month_end’, ‘saleIs_month_start’, ‘saleIs_quarter_end’,
‘saleIs_quarter_start’, ‘saleIs_year_end’, ‘saleIs_year_start’]
cont_names = [‘MachineHoursCurrentMeter’, ‘saleElapsed’, ‘age’]
procs = [FillMissing, Categorify, Normalize]
#Make a subset for doing a trial
df = train_df.head(5000).copy()
#Change all category columns to category type
for col in cat_names:
df[col] = df[col].astype(‘category’)
#Create TabularDataBunch
data = (TabularList.from_df(df, path=path, cat_names=cat_names, cont_names=cont_names, procs=procs)
.split_by_idx(list(range(4500,4999)))
.label_from_df(cols=dep_var, label_cls=FloatList)
.databunch())
Gives error: ValueError: Buffer dtype mismatch, expected ‘Python object’ but got ‘unsigned long’ at line “.label_from_df(cols=dep_var, label_cls=FloatList)”
I looked at other threads where users have faced same issue, but found no workable solution. Is the fastai library DL not suitable for regression problems?
Thanks