Training an Autoencoder using U-Net architecture

Has anyone seen any resources(notebooks) discussing training Autoencoders using U-Net on fastai?

So far I haven’t seen a tutorial that uses fastai to train an autoencoder.

IIRC there was a post by Jeremy where he said that training an encoder usually works better on a real task. So if you have labels I would just use them to train an encoder.

@jc-denton Thank you for the response :slight_smile: Would you know where I could find the post? I searched earlier and couldn’t find it

A Unet is not a good autoencoder because it has bypass connections, so it would probably don’t learn anything. Please try and let us know if it works for you!
I built a small example some time ago:


I may transform this notebook on a blog post.

Thanks for the response and resource :slight_smile: I will definitely try this and keep you posted.

Even though I understand that the bypass/skip connection would probably prevent the model from learning anything, would it be possible to remove the connections and train using U-net?

Of corse you can do that.

Any suggestions to where I could look to figure out how to do this? Sorry if this is a basic question, I am still in the process of finding tutorials, reading through the documentation and figuring this all out. Any direction is helpful :slight_smile:

The segmentation tutorial?

Isn’t the superresolution notebook a u-net auto encoder? I don’t know if it’s still available in fastai v2 but in fastai v1, Jeremy “crappified” an image by blurring it and adding some noise (numbers) and then tried to re-create it. I guess this classifies as an auto encoder since it does not have any labels.

No, it has labels. The labels are the non crappified image. An autoecoder needs the bottleneck at the middle to force the model to represent the image in a reduced latent space. The UNet hast shortcuts that prevents this. If you leave them, the model becomes essentially:

f(x) =x

Nice: this thing has LaTeX support!

I do not interpret it like that.
An auto encoder has autogenerated labels which means that a human do not need to classify the images. A crappified label is autogenerated the same way as eg making tiles of your image and letting the autoencoder to sort out the tile-order.

When you insert the crappified image thru your U-net, it uses the shortcut information to know where the items are located in the image and the bottom of the U to know what the items are. eg, if you have something that looks like a cat. but can’t see the fur, the deconvolutions will generate it for you. Or if it’s a vase, it will generate another surface with a nice outline.

It need all the information to succeed well and that’s why it will be forced to compress the image into embeddings. Or am I wrong about this?

I see more autoecnoders as a tool to encode images into a lesser dimension manifold.
Superres, or any crappified based model I see them more as a generative model. Even if the arch is the same, is more as a math definition.

I’ve tried to make an auto encoder as a U-net for the minist dataset to just re-create the image by compressing it to a less dimension and then extract it again. I guess you also consider that an autoencoder?
The result is very soft images with a lot of noise removed.
Isn’t the Super resolution U-net architecture almost the same but with the difference of some leakage to be able to create crisp corners from soft images?

Both of these models are generative right?

In the superres notebook they use FeatureLoss based on the gram matrix. The MSE loss does this blurriness. (You are minimizing to be good on average pixelwise)

If your problem is to encode images and recover the latent vector, (to use it elsewhere for instance). I would recommend using an encoder/decoder instead of the UNET.

so did you get the AE to work with the UNET?