Hello,
I am trying to follow the tabular example for fastai v1. I encounter one problem though. The dependent variable in the example is categorical (true or false) while mine now is continuous (sale prices). I can’t find any information in the docs, on how to set the dependent variable type to continuous. When I load the data bunch as follows and then applying the transformations, i get over 600 categories (1 for each price), which is not what i want:
dep_var = 'SalePrice'
cat_names = ['MSSubClass', 'MSZoning', 'Street', 'Alley', 'LotShape', 'LandContour',
'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2',
'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',
'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType', 'ExterQual', 'ExterCond',
'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2',
'Heating', 'HeatingQC', 'CentralAir', 'Electrical', 'KitchenQual', 'Functional', 'FireplaceQu',
'GarageType', 'GarageYrBlt', 'GarageFinish', 'GarageQual', 'GarageCond', 'PavedDrive',
'PoolQC', 'Fence', 'MiscFeature', 'MoSold', 'YrSold', 'SaleType', 'SaleCondition']
cont_names = ['LotFrontage', 'LotArea', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF',
'1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath',
'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'GarageCars',
'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal']
procs = [FillMissing, Categorify, Normalize, ]
n_df = len(df)
p_valid = 0.2
n_valid = int(n_df * p_valid)
valid_idx = range(n_df-n_valid, n_df)
valid_idx
data = TabularDataBunch.from_df(
path, df, dep_var, valid_idx=valid_idx, procs=procs, cat_names=cat_names, cont_names=cont_names,
)
Now
data.train_ds.y
returns:
CategoryList (1168 items)
[Category 181500, Category 223500, Category 140000, Category 250000, Category 143000]...
Path: data/house
and
data.train_ds.y.c
returns a number of 587 unique categories.
This results in an Error during validation, because the validation set contains ‘catogries’ (in fact prices), which are not present in the training set.
As stated above, I can’t find any information on how to treat the dep var as continuous.
Does anyone have an idea?
Thanks!