Issue with continuous dependent variable when using TabularDataBunch

Hi

I am testing out the tabular learner on housing dataset on Kaggle. I tried running the code below and I keep getting a warning.

dep_var = 'SalePrice' 

cat_names = ['MSSubClass', 'MSZoning', 'Street', 'LotShape', 'LandContour', 'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood',
            'Condition1', 'Condition2', 'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond', 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd',
            'MasVnrType', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'Heating',
            'HeatingQC', 'CentralAir', 'Electrical', 'KitchenQual', 'Functional','GarageType', 'GarageFinish', 'GarageQual',
            'GarageCond', 'PavedDrive', 'SaleType', 'SaleCondition', 'YearBuilt', 'YearRemodAdd', 'BedroomAbvGr', 
             'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'GarageYrBlt', 'GarageCars', 'MoSold', 'YrSold']

cont_names = ['LotFrontage', 'LotArea', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea',
              'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal']

procs = [FillMissing, Categorify, Normalize]

The warning I receive is as follows

/opt/conda/lib/python3.6/site-packages/fastai/data_block.py:537: UserWarning: You are labelling your items with CategoryList.
Your valid set contained the following unknown labels, the corresponding items have been discarded.
149300, 155835, 301500, 294000, 137900...
  if getattr(ds, 'warn', False): warn(ds.warn)

I believe this may be due to the dependent variable being seen as a category and are being dropped. If that is the case how do I fix this?

Would appreciate if someone could help me understand this warning.

Are you declaring your label_cls as a FloatList? Like how it is in the Rossmann notebook

1 Like

I am actually following this notebook ‘lesson4-tabular.ipynb’. The Rossman notebook is for the older fastai which is v0.7.

You should follow rossmann for regression from the course-v3 repo. It should be lesson 6 (IIRC)

2 Likes

Hey thanks a lot ! I got it fixed

1 Like