Porto Seguro Winning Solution -- Representation learning


Very interesting info Ben, thank you for sharing :slight_smile:

By much better results you mean ability to reproduce MNIST images from your validation set? What cost function does one normally use for that - some distance measure on pixel values?

In general, we would use the ability of an autoencoder to reproduce the inputs as a way of gauging the quality of the middle layer activations?

(Ben Eacrett) #42

I saw much better results than just noising the data and better than adding l1 regularization (this was used by the tutorials I have followed -> from the keras blog). Note - did not try batch norm.

Error - used the same as in the tutorials -> just MSE on the reproductions. And yes, ‘better results’ meant less error (perhaps subjective, but also looked better visually at similar error levels)

My interpretation is that the ability to closely reproduce inputs implies that the ae has been able to learn / extract latent structure(s) in the data (which should correlate to potential usefulness of the encodings).

As a side note, for a lot of deep learning based solutions, it makes sense that ae’s have not retained popularity - you might expect the network you are using to extract this structure anyways (I think Jeremy commented or alluded to this elsewhere). In the structured data case we’re discussing here, perhaps we can look at the ae as doing something analogous to embeddings -> extracting a rich feature representation.

(Kerem Turgutlu) #43

Still experimenting (sample data is used in the results below):

  1. cv gini scores with ordinary MLP only using OHE and raw features:


  1. My second attempt was to use a relu activated MLP as autoencoder but it failed badly (even worse than above).

  2. When used linear activation in bottleneck and relu for others this is gini cv scores:



Should someone find this thread sometime down the road… here is a really neat paper that provides a nice overview of autoencoders and their applicability.

Extracting and Composing Robust Features with Denoising Autoencoders by Vincet et al

(Even Oldridge) #45

I’m taking a look at your github as this is of interest to me right now and one thing I did notice was that the winning solution wasn’t entirely linear. Only the middle layer was.

I’m curious about how the embedding model worked. That was my first instinct as well, especially since some of my categorical variables are huge categories.

Are you going to be in part II this spring? If so maybe we can work on this together.

(Kerem Turgutlu) #46

Hi Even,

I will be taking Part 2 starting next week as well. DAE should have relu activation except for middle layer where middle layer has linear activation. I don’t know what is the main motivation behind this. If this is not the case in my class definition I should check that. Embedding model as it is didn’t work much better than Xgb models shared in kaggle.