How can i treat the dependent variable of a TabularDataBunch as continuous?

Luckymator · November 30, 2018, 1:12pm

Hello,
I am trying to follow the tabular example for fastai v1. I encounter one problem though. The dependent variable in the example is categorical (true or false) while mine now is continuous (sale prices). I can’t find any information in the docs, on how to set the dependent variable type to continuous. When I load the data bunch as follows and then applying the transformations, i get over 600 categories (1 for each price), which is not what i want:

dep_var = 'SalePrice'
cat_names = ['MSSubClass', 'MSZoning', 'Street', 'Alley', 'LotShape', 'LandContour',
            'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2',
            'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',
            'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType', 'ExterQual', 'ExterCond',
            'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2',
            'Heating', 'HeatingQC', 'CentralAir', 'Electrical', 'KitchenQual', 'Functional', 'FireplaceQu',
            'GarageType', 'GarageYrBlt', 'GarageFinish', 'GarageQual', 'GarageCond', 'PavedDrive',
            'PoolQC', 'Fence', 'MiscFeature', 'MoSold', 'YrSold', 'SaleType', 'SaleCondition']
cont_names = ['LotFrontage', 'LotArea', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF',
             '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 
             'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'GarageCars', 
              'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal']
procs = [FillMissing, Categorify, Normalize, ]

n_df = len(df)
p_valid = 0.2
n_valid = int(n_df * p_valid)

valid_idx = range(n_df-n_valid, n_df)
valid_idx

data = TabularDataBunch.from_df(
    path, df, dep_var, valid_idx=valid_idx, procs=procs, cat_names=cat_names, cont_names=cont_names,    
)

Now

data.train_ds.y

returns:

CategoryList (1168 items)
[Category 181500, Category 223500, Category 140000, Category 250000, Category 143000]...
Path: data/house

and

data.train_ds.y.c

returns a number of 587 unique categories.
This results in an Error during validation, because the validation set contains ‘catogries’ (in fact prices), which are not present in the training set.

As stated above, I can’t find any information on how to treat the dep var as continuous.

Does anyone have an idea?
Thanks!

Luckymator · December 6, 2018, 9:32am

So I found a way to treat the price as a float.
Either by transforming it like so:

train_df['SalePrice'] = train_df['SalePrice'].astype('float')

or by labeling it with:

data = (TabularList.from_df(train_df, path=path, cat_names=cat_names, cont_names=cont_names, procs=procs)
                           .split_by_idx(valid_idx)
                           .label_from_df(cols=dep_var, label_cls=FloatList)
                           .databunch())

This however results in a different error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-70-c6076a6ce3f3> in <module>
----> 1 learn.fit(1, 1e-2)

~/fastai/lib/python3.6/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    161         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
    162         fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 163             callbacks=self.callbacks+callbacks)
    164 
    165     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~/fastai/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     92     except Exception as e:
     93         exception = e
---> 94         raise e
     95     finally: cb_handler.on_train_end(exception)
     96 

~/fastai/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     87             if hasattr(data,'valid_dl') and data.valid_dl is not None and data.valid_ds is not None:
     88                 val_loss = validate(model, data.valid_dl, loss_func=loss_func,
---> 89                                        cb_handler=cb_handler, pbar=pbar)
     90             else: val_loss=None
     91             if cb_handler.on_epoch_end(val_loss): break

~/fastai/lib/python3.6/site-packages/fastai/basic_train.py in validate(model, dl, loss_func, cb_handler, pbar, average, n_batch)
     52             if not is_listy(yb): yb = [yb]
     53             nums.append(yb[0].shape[0])
---> 54             if cb_handler and cb_handler.on_batch_end(val_losses[-1]): break
     55             if n_batch and (len(nums)>=n_batch): break
     56         nums = np.array(nums, dtype=np.float32)

~/fastai/lib/python3.6/site-packages/fastai/callback.py in on_batch_end(self, loss)
    237         "Handle end of processing one batch with `loss`."
    238         self.state_dict['last_loss'] = loss
--> 239         stop = np.any(self('batch_end', not self.state_dict['train']))
    240         if self.state_dict['train']:
    241             self.state_dict['iteration'] += 1

~/fastai/lib/python3.6/site-packages/fastai/callback.py in __call__(self, cb_name, call_mets, **kwargs)
    185     def __call__(self, cb_name, call_mets=True, **kwargs)->None:
    186         "Call through to all of the `CallbakHandler` functions."
--> 187         if call_mets: [getattr(met, f'on_{cb_name}')(**self.state_dict, **kwargs) for met in self.metrics]
    188         return [getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs) for cb in self.callbacks]
    189 

~/fastai/lib/python3.6/site-packages/fastai/callback.py in <listcomp>(.0)
    185     def __call__(self, cb_name, call_mets=True, **kwargs)->None:
    186         "Call through to all of the `CallbakHandler` functions."
--> 187         if call_mets: [getattr(met, f'on_{cb_name}')(**self.state_dict, **kwargs) for met in self.metrics]
    188         return [getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs) for cb in self.callbacks]
    189 

~/fastai/lib/python3.6/site-packages/fastai/callback.py in on_batch_end(self, last_output, last_target, **kwargs)
    272         if not is_listy(last_target): last_target=[last_target]
    273         self.count += last_target[0].size(0)
--> 274         self.val += last_target[0].size(0) * self.func(last_output, *last_target).detach().cpu()
    275 
    276     def on_epoch_end(self, **kwargs):

~/fastai/lib/python3.6/site-packages/fastai/metrics.py in accuracy(input, targs)
     37     input = input.argmax(dim=-1).view(n,-1)
     38     targs = targs.view(n,-1)
---> 39     return (input==targs).float().mean()
     40 
     41 def error_rate(input:Tensor, targs:Tensor)->Rank0Tensor:

RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 'other'

Excuse me, if these errors seem trivial, but as a beginner, it is really hard to figure it out alone.

Beset regards,
Maik

Luckymator · December 6, 2018, 10:10am

Ok, i found this issue, which deals with the same error.
It does complete now without any errors.

Have to fiddle out the parameters now, to get decent result.

Cheers

sgugger · December 6, 2018, 2:40pm

Nice job figuring it out!

Vertigo42 · April 21, 2019, 10:37am

Is there a chance you can share a notebook with tabular regression solution?
I tried to adapt my databunch similar to the rossmann notebook couldn’t get it right.

I understand that I need to change the loss function, but it also didn’t worked.

It is strange to me that I can’t find a single notebook/kernel of fast.ai with tabular regression solution.
Any help will be great.

Antoine.C · June 18, 2019, 1:21pm

@Vertigo42. You probably figured it out by now, but in case others are still looking for this kind of information, I posted a write up on forecasting with regression for tabular data. (If the link gets broken at some point in the future, start from my GitHub page.)

anupa · January 17, 2020, 6:32am

How to specify that the dependent variable is categorical? I am doing a classification with 2 classes. If I keep the target variable as strings, it is working fine. But if I convert the class names to 0 and 1 flags, the results are completely different.