Lesson 15 official topic

In the autoencoder notebook, we have this setup -

ae = nn.Sequential( #28x28
nn.ZeroPad2d(2), #32x32
conv(1,2), #16x16
conv(2,4), #8x8
conv(4,8), #4x4
deconv(8,4), #8x8
deconv(4,2), #16x16
deconv(2,1, act=False), #32x32
nn.ZeroPad2d(-2), #28x28
nn.Sigmoid()
).to(def_device)

The sigmoid in the last layer maps everything to between 0 and 1. However, the original images have 0-255 for each pixel. Wouldn’t this always generate wrong loss values in mse? Shouldn’t we setup the network to out 0-255 values for each pixel like the orignal images?

I asked ChatGPT this question and this is their answer -

You’re correct. If you’re using the Fashion-MNIST dataset and your final layer in the neural network is a Sigmoid activation, the output values will be in the range [0, 1]. Since the original pixel values in Fashion-MNIST images range from 0 to 255, directly using a Sigmoid output without any scaling would mean that the output pixel values will not match the original input pixel values in terms of scale.


Is this because we have already done the normalization as part of the transformation when we convert the values to tensors?

Thanks!

Right, so TF.to_tensor scaled our original image datge to 0-1. That’s why we should use Sigmoid at the end. Thanks!