Kaggle Zestimate competition

shushi2000 · May 26, 2017, 10:58pm

This house price prediction competition looks fun and have a really long running time. The most exciting thing is that the predictions will be checked against the real final house price, rather than some pre-determined LB values. Let’s use this thread if anyone want to discuss about it.

As of now, I only took a brief look at the datasets, not sure which type of model to use yet

rashudo · May 27, 2017, 6:40am

It is very exciting if only because of the large prize pot. However the rules are somewhat sketchy… you only get the second round prize if you beat their private, secret, undisclosed benchmark. So if they were so inclined they would never have to pay the second round prizes.

shushi2000 · May 29, 2017, 4:58pm

I don’t see myself winning the prize anyway. Just want to participate more, learn more, and build a better portfolio on Kaggle.

davecg · May 29, 2017, 10:28pm

I really doubt they’d do that.

If they can improve their Zestimate, I’d imagine that would be worth a lot more than $1.2 million to them.

It’s probably just a hedge in case no one serious signed on to their competition.

s.s.o · July 17, 2017, 9:37am

Submissions are evaluated on Mean Absolute Error between the predicted log error and the actual log error. The log error is defined as:
logerror=log(Zestimate)−log(SalePrice)
and it is recorded in the transactions training data.

Did anybody try to map to loss function to Keras or any other DL libs. Any suggestions are welcome to get closer kaggle evaluation scores to Keras loss functions or accuracy …

josca · July 17, 2017, 12:52pm

The loss function is just the mean absolute error, which is already supported by the Keras library and most other DL libraries. I have implemented the approach outlined by Jeremy Howard in lesson 14 i.e. using embeddings on “categorical variables” and playing around with some different MLP architectures. It seems to be working really well so far. But haven’t gotten around to making a submission yet. Best of luck with the competition;-).

s.s.o · July 18, 2017, 8:55am

@josca thank you … you, you too.

The problem is PL score is very different from accuracy of DL from keras using mea loss. So, it is really getting hard to synchronize both loss evaluations. You have to make a lot of submissions for improvement of DL and each score is like rolling a dice

Good luck.

josca · July 19, 2017, 6:25am

Arhhh, I see. I think the reason for this is that the primary contribution to the loss function seems to come from a few set of outliers. This means that your train/validation set split has a big effect on what your validation loss is. Even though randomnes ensure that the expected number of outliers in the validation set is the same every time, then in practice I get relatively big differences in the calculated mean absolute error if i just make a new random split each iteration.
Hence, I think that the primary reason that your loss calculated on your validation set differs from the PL loss is due to your validation set containing a different number of outliers than the PL validation set.
Does it make sense?

s.s.o · August 14, 2017, 12:08pm

Hi all,

How you guys are doing with PL? Are you still using DL?
I finally got below 0.065 with DL… With others I got better results.

I have a question with DL my predictions tends to get more positive values and when I checked others results they have more negative values for predictions. Do you have some smart guess why that could happen?