I am referring to this brilliant blog on Auto encoders

Came across this statement that says vanilla encoders run the risk of overfitting.

“Since the autoencoder learns the identity function, we are facing the risk of “overfitting” when there are more network parameters than the number of data points.”

The parameters (𝜃,𝜙) are learned together to output a reconstructed data sample same as the original input, 𝑥≈𝑓𝜃(𝑔𝜙(𝑥)), or in other words, to learn an identity function.

So we apply encoder(x) = z and then apply a decoder function f (z) which results in x’. Now you use either a MSE loss or a cross entropy loss depending on the activation function used and minimize the loss using SGD until x’ = x.

Identity function is a function which just returns the input f(x) = x.

Can someone please explain what do they mean by ‘same as learning an identity function’ ?

The only difference I see is that in a regular deep neural net, the loss function measures the distance between the predicted label and the training label ( the label can be a text caption, a bool value, a pixel value, whatever ) and in the case of an AE the loss function measures the distance between the predicted pixel value and the training pixel value.

Why does learning an identity function when there are more network parameters than input data points lead to overfitting.

When they refer to input data points are they talking about a single training image or the learned weights ?

Many Thanks,

Pradeep