Encoder decoder using cnn_learner function

yanivbl · October 1, 2020, 7:37am

For the sake of practice, I’m trying to build a model that will get an image of 3 and return an image of 7.

My Xs are 3 tensors my y’s are 7 tensors.
I managed to run the following model which seems to be doing the job:

simple_net = nn.Sequential(
    nn.Linear(28*28,30),
    nn.ReLU(),
    nn.Linear(30,28*28)
)

def loss(p,t):
  p=p.sigmoid()
  criterion = nn.MSELoss()
  return criterion(p, t)

learn=Learner(dls,simple_net,opt_func=SGD,loss_func = loss)
learn.fit(80,0.1)
preds,targs = learn.get_preds()
show_image(preds[11].view(28,28)), show_image(targs[0].view(28,28))

results in this:

However, when I try to use the cnn_learner function I’m getting an error:

learn = cnn_learner(dls, resnet18, pretrained=False,
                    loss_func=loss)

AssertionError Traceback (most recent call last)
in ()
1 learn = cnn_learner(dls, resnet18, pretrained=False,
----> 2 loss_func=loss)

1 frames
/usr/local/lib/python3.6/dist-packages/fastai/vision/learner.py in cnn_learner(dls, arch, loss_func, pretrained, cut, splitter, y_range, config, n_out, normalize, **kwargs)
170 meta = model_meta.get(arch, _default_meta)
171 if n_out is None: n_out = get_c(dls)
–> 172 assert n_out, "n_out is not defined, and could not be inferred from data, set dls.c or pass n_out"
173 if normalize: _add_norm(dls, meta, pretrained)
174 if y_range is None and ‘y_range’ in config: y_range = config.pop(‘y_range’)

AssertionError: n_out is not defined, and could not be inferred from data, set dls.c or pass n_out

Can you please help me figure out how to make the cnn_learner code work?

Pomo · October 1, 2020, 5:36pm

Hi again. cnn_learner is intended and designed to map an image into a set of classes. It is not designed to map an image to an image. fastai cannot figure out from the data the number of classes (n_out) that you want to map into.

I suppose you could wrangle resnet into doing this task by using a multilabel custom head that maps features back into 28x28 pixel intensities. But the result would not be an encoder/decoder.

I suspect you are working under a conceptual misunderstanding here. The encoder takes an image and reduces it to a small number activations. The decoder takes those activations and maps back to the same (not different) image. Architectures have already been invented to do this task well. You might want to practice by experimenting with one of them.

HTH, Malcolm

muellerzr · October 1, 2020, 5:49pm

To add onto Malcom’s answer, to do what you are asking is deep into the concepts of GANs, rather than simply the models introduced you’ve likely read about so far The task is image generation. For learning more about them I would recommend watching course-v3 and taking a look at its ported notebooks here (lesson 7)

yanivbl · October 1, 2020, 6:28pm

@muellerzr @Pomo
Thank both of you for your answers!

I was actually toying with this idea just to familiarize myself with fastai using out of the box idea.
I wanted to see what would come out of this. Didn’t expect much

However, there’s something that has been bugging me for days now which I can’t figure out and not getting help either.

I’m unable to use the predict() method to generate a new prediction.

Could you please look at the following threads and let me know where I went wrong?

Many thanks!

Pomo · October 1, 2020, 7:09pm

You need to look at the shape of the image you are trying to predict to see if it’s the shape that predict() expects. If it is, further debugging within fastai. If not, further debugging of your own code that generates the image.

yanivbl · October 1, 2020, 7:53pm

Yes I checked and the shape of the tensor is correct.

What else can I do?
I don’t believe I have the knowledge to debug fastai code…

Pomo · October 1, 2020, 11:02pm

Hi. Here are some questions that will help you to debug…

What is the type and shape of the input image that you send to learn.predict()? Have you displayed it?

What is the type and shape of a batch used for training?

What is the type and shape that learn.predict() expects?

yanivbl · October 2, 2020, 6:36am

Thanks @Pomo for helping me out.

The NN expects a 1x784 tensor.
I made sure of that by passing different size and got the following:
RuntimeError: size mismatch, m1: [28 x 28], m2: [784 x 30] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:41

(m1 representing the tensor I’m passing)

So passing learn.predict(t_3[0].view(1,28*28)) should be ok in terms of tensor size.

The data that was used for training was a DataLoaders. comprised of train and test zip of X= 6000x748 and y=6000x1 (binary) Dataloader.

I’m certain that up to the predict function the code is correct because I copied it from the 4th lesson
https://github.com/fastai/fastbook/blob/master/clean/04_mnist_basics.ipynb

Pomo · October 2, 2020, 4:36pm

This makes no sense. You are using a CNN based on resnet. That CNN expects a batch of input images of shape [bs, #channels, height, width].

yanivbl · October 2, 2020, 4:58pm

Sorry, I’m not sure I understand.
Do you mean it can’t classify for a single image only a batch?

Pomo · October 2, 2020, 5:44pm

You are mixing up the model itself with the predict function. Neither takes the [1,784] tensor that you are giving it. The NN takes a batch of images of shape [bs, channels, h, w]. Predict takes a single image of shape [channels, h, w]. Or a file path which may be simpler.

There are several confusions here. A suggestion: use predict with a file path for now. It will get the task done and you will be able to move forward. Complete all the course lessons without having to understand every detail. By the end, the concepts will be much clearer and you will have developed your own debugging approaches. You will be able to go back and diagnose exactly what was going wrong.

yanivbl · October 2, 2020, 5:58pm

Thanks for the advice @Pomo
I do hope that things clear a bit by the time I finish the course.

As for your suggestion to use a file, I tried this before and was unsuccessful (see screenshot below).
I also tried passing a the image in various shapes (1,748), (28,28),(1,1,1,748). Nothing seems to work.

Can’t tell you how frustrating this is… I appreciate your help and suggestions.

yanivbl · October 2, 2020, 6:00pm

BTW the only thing that worked for me (proves that predicting is possible) is the following

!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()
from fastai.vision.all import *
from fastbook import *
matplotlib.rc('image', cmap='Greys')

 path = untar_data(URLs.MNIST_SAMPLE)
dls = ImageDataLoaders.from_folder(path)
learn=cnn_learner(dls,resnet18,pretrained=False,loss_func=F.cross_entropy,metrics = accuracy)
learn.fit_one_cycle(1,0.1)

(tr,vl)= dls.one_batch()
learn.get_preds(dl=[(tr,vl)])

Pomo · October 2, 2020, 6:42pm

I see from looking at the stack trace that you are sending predict() a png file rather than a pil file. Predict() says that it doe not accept this type of file. Maybe that’s the problem.

yanivbl · October 2, 2020, 6:57pm

Interesting.
I trained the model using these png images.
It would be very strange if I can’t predict the same format I used to trained with.

But right now everything seems possible

Do you know how I can convert my png to PIL?

Pomo · October 2, 2020, 9:57pm

I could research and figure it out, but not interested in doing so.

yanivbl · October 3, 2020, 4:18am

Gotcha.
Appreciate your help!

yanivbl · October 3, 2020, 11:35am

Hi @Pomo,
Just wanted to let you know that weirdly enough the problem seems to stem from the loss function

Thanks for your time.

Pomo · October 4, 2020, 3:44pm

Glad you got it working.