Is there any way to use Unet_learner for Autoencoders for images datasets

Hey everybody! Since we use a fully convolutional network for segmentation task, I was wondering if we can use the same learner for autoencoder for images as well. Perhaps changing loss function to MSE loss and making the label the same as the input might work. If anyone has already worked on this problem please let me know, if you have any ideas regarding this please feel free to post (it would be great if you could share your code as well).


Yeah, so I used unet_learner for autoencoder, turns out it is quite straight forward. I changed only three things from Cam-vid tiramisu notebook from lesson 3. The results look quite great. But I hope the network has learnt something and hasn’t just skipped all the layers all together using the skip connections in UNet. It would be great if someone can verify this.

The changes made to cam-vid notebook from lesson 3 are as follows:

  1. Changed SegmentationItemList to ImageImageList
  1. Changed the label function to def get_y_fn(x): return x
  1. Changed the loss function to torch.nn.MSELoss() as this becomes a regression task.

I am attaching the notebook link with this post (, you can play around with it on colab.
P.S: There are almost no comments in the notebook except where I have changed something from the original notebook, I am assuming you have an idea about what are autoencoders and why do we need to use them. The network is trained on only half sized images as the GPU memory for google colab was getting full when I tried to train for full-sized images, the results for full-sized images in the notebook are for CAMVID tiny dataset


Unfortunately that is most probably what happens. Also think about how you want to apply your Auto Encoder. If you want to reconstruct the exact input image (for compression or whatever) you will have to use the encoder and decoder separately somewhere (e.g. compress with encoder on server, let client decompress using decoder). Now the point where the decoder is used, you will usually not have the inputs to those skip connections. If you had, in plain reconstruction, you could skip the whole process because that already contains your image.

Your network might however still be useful for say denoising or so, where you usually use the whole auto encoder on one machine. However for plain reconstruction that is usually not the case, which kind of kills those skip connections :confused:
So if you throw a crappify or noise of whatever kind on your input images (see last lessons in part 1), your unet approach might be a very decent baseline for a denoiser.


I really don’t thing that a U-net is adapted for the task. The skip connections are the problem.
I would try a custom defined model with an:

  • Encoder (can be a Resnet type encoder)
  • Decoder ( a simple decoder, like the one in the UNET)
    So, you can start with a Unet where you remove the skip connections.
1 Like

What if I want to use the encoder part of the network as a pretrained model. Let’s say I have some images from medical dataset, rather than using a pretrained model on ImageNet dataset, encoder part of this Network (after training as autoencoder on those medical images) would give better results, won’t it?

1 Like

I will actually test this out. If using such kind of autoencoder training for pretraining actually helps.

1 Like

It can be useful, if the encoder actually encodes the image structure.
Maybe you can check the last layers of your unet and investigate how large the activations from the bottleneck are compared to the very first skip connections outputs. I would assume that if your network just skips the whole thing and uses the first skip connection, the activations from the actual AE should go to zero, so that your Merge Layer outputs the same as the skip connection itself.
Similarly (in order to produce those 0 activations), the weights might be going to 0 as well.

Further, you can take a look at the latent space, maybe try to t-SNE your latent vectors. Usually when your encoder produces nice features, t-SNE might be able to cluster your data points nicely based on this extraction.
Probably you can also just use the encoder as pretraining, as you said, and compare classification performance / training speed against randomly initialized.


Im also doing a similar thing.
I wanna create a autoencoder and use the reconstructed photo for dual loss. (Segmentation and classification loss)
But I have no idea how to crate the label function for dual loss.

1 Like

We’re you able to create autoencoder at least?

I use the superres model as a basis and create a smaller image.

1 Like

Would you mind sharing this?
I built something similar modifying the UnetBlock (removing the Merge).