Hello all! I’m having a ever-persistant problem with the Kaggle Housing Competition. I use OneHot encoding (pd.get_dummies) to convert the data into all numbers. A CSV file of the converted data can be found below in the DropBox. However, upon using model.fit, I get 0% accuracy (image posted below). I then go on to doing predictions (Test), and the MatPlotLib graph indicates that my model is predicting the same number every time! Perhaps there is a problem with compatibility or something?
I am extremely grateful for any help - this problem has been plaguing me for about a month now! - all pictures and files (data/CSV/ipynb) can be found below!
Click Here to Access DropBox Folder
HousePrices.csv - Original Data - Also available for download on Kaggle website
EditedHousePrices (1).csv - OneHot Encoded Data Made by Code
HousePrices.ipynb - Python Notebook of the Code
Images of Compiled Code:
You might want to use mse or rmse as your metric as you’re working on a regression problem.
Hi binga, thanks for the reply!
Thank you for pointing that out. I’m generally new to DL so still learning the basics.
When I set MSE as the Metric, I get a loss and MSE of nan. Would you happen to know why this happens, and how it can be fixed? Thanks!
I used this to normalize: https://stackoverflow.com/questions/37935920/quantile-normalization-on-pandas-dataframe
The results were still the same (nan). I will continue to look for an alternative way to normalize, but am I doing this correctly?
Edit: I’ve also tried using this, but to no avail: (same nan error)
Further investigation shows that my deep learning data (EditedHousingPrices) has some infinite/nan values - print(np.all(np.isfinite(df_with_dummies))) returns false! np.isnan returns true!
How would I remove/edit these values? Wasn’t able to find much info regarding this.
Yet another update…
I’ve figured out how to replace the NaN/Infinite values. Training gives a ridiculously big loss/MSE of numbers around 38426047151.3425. Then test predictions give an RMS of 198,138. Which is terrible given that the home prices in the data are all around $200,000. Am I doing something wrong?
The graph shows that although the model is trying to predict, it’s failing terribly…
I really appreciate your help on this!
I’ve increased the Neurons in the first dense layer to 290 and increased the epochs to 300, which has gotten my RMS down to 25,000! Other than increasing epochs, are there any other things I can do to make my model better? e.g. Increasing the number of dense layers?
Thanks so much!