How can i treat the dependent variable of a TabularDataBunch as continuous?


(Maik) #1

Hello,
I am trying to follow the tabular example for fastai v1. I encounter one problem though. The dependent variable in the example is categorical (true or false) while mine now is continuous (sale prices). I can’t find any information in the docs, on how to set the dependent variable type to continuous. When I load the data bunch as follows and then applying the transformations, i get over 600 categories (1 for each price), which is not what i want:

dep_var = 'SalePrice'
cat_names = ['MSSubClass', 'MSZoning', 'Street', 'Alley', 'LotShape', 'LandContour',
            'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2',
            'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',
            'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType', 'ExterQual', 'ExterCond',
            'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2',
            'Heating', 'HeatingQC', 'CentralAir', 'Electrical', 'KitchenQual', 'Functional', 'FireplaceQu',
            'GarageType', 'GarageYrBlt', 'GarageFinish', 'GarageQual', 'GarageCond', 'PavedDrive',
            'PoolQC', 'Fence', 'MiscFeature', 'MoSold', 'YrSold', 'SaleType', 'SaleCondition']
cont_names = ['LotFrontage', 'LotArea', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF',
             '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 
             'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'GarageCars', 
              'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal']
procs = [FillMissing, Categorify, Normalize, ]

n_df = len(df)
p_valid = 0.2
n_valid = int(n_df * p_valid)

valid_idx = range(n_df-n_valid, n_df)
valid_idx

data = TabularDataBunch.from_df(
    path, df, dep_var, valid_idx=valid_idx, procs=procs, cat_names=cat_names, cont_names=cont_names,    
)

Now

data.train_ds.y

returns:

CategoryList (1168 items)
[Category 181500, Category 223500, Category 140000, Category 250000, Category 143000]...
Path: data/house

and

data.train_ds.y.c

returns a number of 587 unique categories.
This results in an Error during validation, because the validation set contains ‘catogries’ (in fact prices), which are not present in the training set.

As stated above, I can’t find any information on how to treat the dep var as continuous.

Does anyone have an idea?
Thanks!


(Maik) #2

So I found a way to treat the price as a float.
Either by transforming it like so:

train_df['SalePrice'] = train_df['SalePrice'].astype('float')

or by labeling it with:

data = (TabularList.from_df(train_df, path=path, cat_names=cat_names, cont_names=cont_names, procs=procs)
                           .split_by_idx(valid_idx)
                           .label_from_df(cols=dep_var, label_cls=FloatList)
                           .databunch())

This however results in a different error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-70-c6076a6ce3f3> in <module>
----> 1 learn.fit(1, 1e-2)

~/fastai/lib/python3.6/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    161         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
    162         fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 163             callbacks=self.callbacks+callbacks)
    164 
    165     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~/fastai/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     92     except Exception as e:
     93         exception = e
---> 94         raise e
     95     finally: cb_handler.on_train_end(exception)
     96 

~/fastai/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     87             if hasattr(data,'valid_dl') and data.valid_dl is not None and data.valid_ds is not None:
     88                 val_loss = validate(model, data.valid_dl, loss_func=loss_func,
---> 89                                        cb_handler=cb_handler, pbar=pbar)
     90             else: val_loss=None
     91             if cb_handler.on_epoch_end(val_loss): break

~/fastai/lib/python3.6/site-packages/fastai/basic_train.py in validate(model, dl, loss_func, cb_handler, pbar, average, n_batch)
     52             if not is_listy(yb): yb = [yb]
     53             nums.append(yb[0].shape[0])
---> 54             if cb_handler and cb_handler.on_batch_end(val_losses[-1]): break
     55             if n_batch and (len(nums)>=n_batch): break
     56         nums = np.array(nums, dtype=np.float32)

~/fastai/lib/python3.6/site-packages/fastai/callback.py in on_batch_end(self, loss)
    237         "Handle end of processing one batch with `loss`."
    238         self.state_dict['last_loss'] = loss
--> 239         stop = np.any(self('batch_end', not self.state_dict['train']))
    240         if self.state_dict['train']:
    241             self.state_dict['iteration'] += 1

~/fastai/lib/python3.6/site-packages/fastai/callback.py in __call__(self, cb_name, call_mets, **kwargs)
    185     def __call__(self, cb_name, call_mets=True, **kwargs)->None:
    186         "Call through to all of the `CallbakHandler` functions."
--> 187         if call_mets: [getattr(met, f'on_{cb_name}')(**self.state_dict, **kwargs) for met in self.metrics]
    188         return [getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs) for cb in self.callbacks]
    189 

~/fastai/lib/python3.6/site-packages/fastai/callback.py in <listcomp>(.0)
    185     def __call__(self, cb_name, call_mets=True, **kwargs)->None:
    186         "Call through to all of the `CallbakHandler` functions."
--> 187         if call_mets: [getattr(met, f'on_{cb_name}')(**self.state_dict, **kwargs) for met in self.metrics]
    188         return [getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs) for cb in self.callbacks]
    189 

~/fastai/lib/python3.6/site-packages/fastai/callback.py in on_batch_end(self, last_output, last_target, **kwargs)
    272         if not is_listy(last_target): last_target=[last_target]
    273         self.count += last_target[0].size(0)
--> 274         self.val += last_target[0].size(0) * self.func(last_output, *last_target).detach().cpu()
    275 
    276     def on_epoch_end(self, **kwargs):

~/fastai/lib/python3.6/site-packages/fastai/metrics.py in accuracy(input, targs)
     37     input = input.argmax(dim=-1).view(n,-1)
     38     targs = targs.view(n,-1)
---> 39     return (input==targs).float().mean()
     40 
     41 def error_rate(input:Tensor, targs:Tensor)->Rank0Tensor:

RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 'other'

Excuse me, if these errors seem trivial, but as a beginner, it is really hard to figure it out alone.

Beset regards,
Maik


(Maik) #3

Ok, i found this issue, which deals with the same error.
It does complete now without any errors.

Have to fiddle out the parameters now, to get decent result.

Cheers


#4

Nice job figuring it out!