I saw much better results than just noising the data and better than adding l1 regularization (this was used by the tutorials I have followed -> from the keras blog). Note - did not try batch norm.
Error - used the same as in the tutorials -> just MSE on the reproductions. And yes, ‘better results’ meant less error (perhaps subjective, but also looked better visually at similar error levels)
My interpretation is that the ability to closely reproduce inputs implies that the ae has been able to learn / extract latent structure(s) in the data (which should correlate to potential usefulness of the encodings).
As a side note, for a lot of deep learning based solutions, it makes sense that ae’s have not retained popularity - you might expect the network you are using to extract this structure anyways (I think Jeremy commented or alluded to this elsewhere). In the structured data case we’re discussing here, perhaps we can look at the ae as doing something analogous to embeddings -> extracting a rich feature representation.