Lesson 15 official topic

ayh2 · November 24, 2023, 7:38am

In the autoencoder notebook, we have this setup -

ae = nn.Sequential( #28x28
nn.ZeroPad2d(2), #32x32
conv(1,2), #16x16
conv(2,4), #8x8
conv(4,8), #4x4
deconv(8,4), #8x8
deconv(4,2), #16x16
deconv(2,1, act=False), #32x32
nn.ZeroPad2d(-2), #28x28
nn.Sigmoid()
).to(def_device)

The sigmoid in the last layer maps everything to between 0 and 1. However, the original images have 0-255 for each pixel. Wouldn’t this always generate wrong loss values in mse? Shouldn’t we setup the network to out 0-255 values for each pixel like the orignal images?

I asked ChatGPT this question and this is their answer -

You’re correct. If you’re using the Fashion-MNIST dataset and your final layer in the neural network is a Sigmoid activation, the output values will be in the range [0, 1]. Since the original pixel values in Fashion-MNIST images range from 0 to 255, directly using a Sigmoid output without any scaling would mean that the output pixel values will not match the original input pixel values in terms of scale.

Is this because we have already done the normalization as part of the transformation when we convert the values to tensors?

Thanks!

ayh2 · November 25, 2023, 2:17pm

Right, so TF.to_tensor scaled our original image datge to 0-1. That’s why we should use Sigmoid at the end. Thanks!

madfatlad · March 16, 2025, 3:31am

I tried going deeper into the efficiency of CNNs with this blog :- Raghav’s Blog – Blog