Encoder decoder using cnn_learner function

For the sake of practice, I’m trying to build a model that will get an image of 3 and return an image of 7.

My Xs are 3 tensors my y’s are 7 tensors.
I managed to run the following model which seems to be doing the job:

simple_net = nn.Sequential(
    nn.Linear(28*28,30),
    nn.ReLU(),
    nn.Linear(30,28*28)
)

def loss(p,t):
  p=p.sigmoid()
  criterion = nn.MSELoss()
  return criterion(p, t)

learn=Learner(dls,simple_net,opt_func=SGD,loss_func = loss)
learn.fit(80,0.1)
preds,targs = learn.get_preds()
show_image(preds[11].view(28,28)), show_image(targs[0].view(28,28))

results in this:
image
image

However, when I try to use the cnn_learner function I’m getting an error:

learn = cnn_learner(dls, resnet18, pretrained=False,
                    loss_func=loss)

AssertionError Traceback (most recent call last)
in ()
1 learn = cnn_learner(dls, resnet18, pretrained=False,
----> 2 loss_func=loss)

1 frames
/usr/local/lib/python3.6/dist-packages/fastai/vision/learner.py in cnn_learner(dls, arch, loss_func, pretrained, cut, splitter, y_range, config, n_out, normalize, **kwargs)
170 meta = model_meta.get(arch, _default_meta)
171 if n_out is None: n_out = get_c(dls)
–> 172 assert n_out, "n_out is not defined, and could not be inferred from data, set dls.c or pass n_out"
173 if normalize: _add_norm(dls, meta, pretrained)
174 if y_range is None and ‘y_range’ in config: y_range = config.pop(‘y_range’)

AssertionError: n_out is not defined, and could not be inferred from data, set dls.c or pass n_out

Can you please help me figure out how to make the cnn_learner code work?

1 Like

Hi again. cnn_learner is intended and designed to map an image into a set of classes. It is not designed to map an image to an image. fastai cannot figure out from the data the number of classes (n_out) that you want to map into.

I suppose you could wrangle resnet into doing this task by using a multilabel custom head that maps features back into 28x28 pixel intensities. But the result would not be an encoder/decoder.

I suspect you are working under a conceptual misunderstanding here. The encoder takes an image and reduces it to a small number activations. The decoder takes those activations and maps back to the same (not different) image. Architectures have already been invented to do this task well. You might want to practice by experimenting with one of them.

HTH, Malcolm

3 Likes

To add onto Malcom’s answer, to do what you are asking is deep into the concepts of GANs, rather than simply the models introduced you’ve likely read about so far :slight_smile: The task is image generation. For learning more about them I would recommend watching course-v3 and taking a look at its ported notebooks here (lesson 7)

1 Like

@muellerzr @Pomo
Thank both of you for your answers!

I was actually toying with this idea just to familiarize myself with fastai using out of the box idea.
I wanted to see what would come out of this. Didn’t expect much :sweat_smile:

However, there’s something that has been bugging me for days now which I can’t figure out and not getting help either.

I’m unable to use the predict() method to generate a new prediction.

Could you please look at the following threads and let me know where I went wrong?

Many thanks!

You need to look at the shape of the image you are trying to predict to see if it’s the shape that predict() expects. If it is, further debugging within fastai. If not, further debugging of your own code that generates the image.

1 Like

Yes I checked and the shape of the tensor is correct.

What else can I do?
I don’t believe I have the knowledge to debug fastai code…

Hi. Here are some questions that will help you to debug…

What is the type and shape of the input image that you send to learn.predict()? Have you displayed it?

What is the type and shape of a batch used for training?

What is the type and shape that learn.predict() expects?

1 Like

Thanks @Pomo for helping me out.

The NN expects a 1x784 tensor.
I made sure of that by passing different size and got the following:
RuntimeError: size mismatch, m1: [28 x 28], m2: [784 x 30] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:41

(m1 representing the tensor I’m passing)

So passing learn.predict(t_3[0].view(1,28*28)) should be ok in terms of tensor size.

The data that was used for training was a DataLoaders. comprised of train and test zip of X= 6000x748 and y=6000x1 (binary) Dataloader.

I’m certain that up to the predict function the code is correct because I copied it from the 4th lesson
https://github.com/fastai/fastbook/blob/master/clean/04_mnist_basics.ipynb

This makes no sense. You are using a CNN based on resnet. That CNN expects a batch of input images of shape [bs, #channels, height, width].

Sorry, I’m not sure I understand.
Do you mean it can’t classify for a single image only a batch?

You are mixing up the model itself with the predict function. Neither takes the [1,784] tensor that you are giving it. The NN takes a batch of images of shape [bs, channels, h, w]. Predict takes a single image of shape [channels, h, w]. Or a file path which may be simpler.

There are several confusions here. A suggestion: use predict with a file path for now. It will get the task done and you will be able to move forward. Complete all the course lessons without having to understand every detail. By the end, the concepts will be much clearer and you will have developed your own debugging approaches. You will be able to go back and diagnose exactly what was going wrong. :slightly_smiling_face: :slightly_smiling_face: :slightly_smiling_face:

Thanks for the advice @Pomo
I do hope that things clear a bit by the time I finish the course.

As for your suggestion to use a file, I tried this before and was unsuccessful (see screenshot below).
I also tried passing a the image in various shapes (1,748), (28,28),(1,1,1,748). Nothing seems to work.

Can’t tell you how frustrating this is… I appreciate your help and suggestions.

BTW the only thing that worked for me (proves that predicting is possible) is the following

!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()
from fastai.vision.all import *
from fastbook import *
matplotlib.rc('image', cmap='Greys')

 path = untar_data(URLs.MNIST_SAMPLE)
dls = ImageDataLoaders.from_folder(path)
learn=cnn_learner(dls,resnet18,pretrained=False,loss_func=F.cross_entropy,metrics = accuracy)
learn.fit_one_cycle(1,0.1)
(tr,vl)= dls.one_batch()
learn.get_preds(dl=[(tr,vl)])

I see from looking at the stack trace that you are sending predict() a png file rather than a pil file. Predict() says that it doe not accept this type of file. Maybe that’s the problem.

Interesting.
I trained the model using these png images.
It would be very strange if I can’t predict the same format I used to trained with.

But right now everything seems possible :wink:

Do you know how I can convert my png to PIL?

I could research and figure it out, but not interested in doing so. :slightly_smiling_face:

1 Like

Gotcha.
Appreciate your help!

Hi @Pomo,
Just wanted to let you know that weirdly enough the problem seems to stem from the loss function

Thanks for your time.

Glad you got it working.