Cats or dogs or neither?

@jeremy: First of all, congratulations, the version 2 is even better than version 1. Very addictive I would say.

I wanted to do a ‘Forgery detector’ to classify original signatures from forged ones. But I ran into a problem (in conceptualising the model).

  1. We are classifying a ‘cat’ or a ‘dog’ by detecting the ‘presence’ of certain features. Can we also do a classification by ‘absence’ of those features using the neural net that we have so far developed in Lesson 1? Or is it possible to make it a ‘Cat’ vs ‘Not a cat?’ by tweaking the last layer?.

  2. Is it also possible (by extension of the above) to form 4 classes - Cat, Dog, Neither, Can’t Say

  3. What will happen if we try to maximise the loss instead of minimising… will it help to find out the ‘not this’ type of classification?

Am I attempting something stupid here?


Sure, if you give enough classes for Cat and Not Cat, the Neural network will pick up the difference between the two. But I don’t think you want to tweak the last layer yourself. You want the network to learn it and you can introspect the neuron weights in final layer for these two nodes (Cat and Not Cat) if that’s useful to you.

If you can generate enough labelled examples of images for all four classes, Neural Network will classify it. But you can possibly extract the class “Can’t Say” if the Probability is around 0.5 (the model is unsure). so you don’t need the fourth class. Cat, Dog, Neither might be all that’s needed and you can derive the fourth when model is unsure as defined by the probabilities.

I am not sure if I understand this. If you maximize the loss, it will go to Infinity and the network weights may not be useful.

Couple of suggestions -

  1. Try to ask questions within the existing thread for Week 1-7, that way others can benefit from the questions and we don’t have too many threads.
  2. Don’t @ Jeremy unless you feel that he’s the only one that can answer your question. There are number of students that went through the Part 1 V2 that might be able to help you.

If these responses are not helpful to address your questions, please say so here. I am sure another student might be able to help or perhaps Jermey might chime in as well.

1 Like

Thank you Ramesh. I wanted to @jeremy only to congratulate him. But thanks for the steer on how to use the forum better.

I am slightly doubtful with 'if you give enough examples, it will learn to differentiate ‘cat vs not cat’. I will try though and let you know, thanks. In the case of my differentiating ‘Real’ vs ‘forged’ signatures, I am not getting good results but it may be due to bad / inadequate samples. But in a practical scenario, i can’t collect 1000 real signatures for training. At the best I can get 10. So may be i am trying the unfit use case here.

@Renga imho you have very less data. You should probably do Data Augmentation if you have around 100’s of samples to improve your results. From my understanding of your use case, you can try performing shear, blur and varying Zoom as well as slight rotation to generate more samples.

Other Recommendation: Try looking out for Signature based dataset (Similar to yours) Online. Train on them as it will have more samples compared what you have currently. Then, transfer the learned weights to fit your dataset.

Hope it helps



@gokkulnath Thank you, Thank you, Thank you… I am already started with augmentation but you are right on the spot i guess.

Yes, this problem is solvable if you have 100 examples of CAT and 200 examples of non-cat (other things). The network will learn to understand what features gets generated for a CAT and draw a decision boundary around it. Data Augmentaiton will also help to prevent over-fitting.

This may be a Different Problem. You are trying to Uniquely identify the Person. This is similar to the “Facial Recognition” that happens in Phones and other places. I would suggest you look into the paper on Deep Face ( They use a Loss called Triplet Loss (Minimize the distance between Two Signatures that are the same while maximizing the distance between Real and Forged Images. So you have three inputs, two signatures that are the same and one that’s different and calculate the triplet loss. You don’t need lots of examples of Real images, just create combinations using 10 similar images from say 100 People and and 990 (99 * 10) signatures that are dissimilar for each person.

Because Signatures are very different from ImageNet photos, you might not be able to use Pre-trained Networks. So you many need to find an alternative pre-trained model to build on or Train a Model from Scratch which would require 10 signatures from a lot of people (probably 10K).

Fast.AI does not have an abstraction to do this problem directly. But you can build this model in PyTorch and use Fast.AI to train it. Because of the complexity involved, you might want to work through the entire part 1 before venturing into it. But this is a good problem and has practical implications. Good Luck and let us know how it goes.


If Google images search results are good then we can use the various downloaders to automate the process
Search in GitHub repository…

By the way, have you had good experience with Google Images crawl?
I didn’t…was hard to make api work and get needed results from search.

But I had a good experience with flickr crawling. Used this to crawl:

@ramesh Got it. This is immensely helpful. I will try a few more options with augmented data and if it doesn’t work out, i will choose some other use case. But first and foremost, thank you so much for taking so much time to reply to my queries. The fraternity is awesome in

1 Like

Just a Quick Question:
For Uniquely identify the Person Problem : I think the Number of Classes grows with the number of persons we need to identify?
If Yes, I doubt that performance will be drastically affected (Say No of people >1000).
( ImageNet has 1000 classes but it was trained on Millions of Images but this is not the case here)
If True? How do we handle this? (No of Images/Classes will become small )
Do you have any idea how it is solved in production environment ?

It would be great if you can give some insight.

I don’t have experience with the problem. But they all use “Triplet Loss” and don’t think they have this limitation of 1000 or a large number. I would suggest you review the Deep Face and Face Net papers and may be google for how Facial Recognition is done. It might offer some ideas. I have not worked on this problem, so don’t think I have anything specific to share here.

Thanks a lot. I will look into it :slight_smile: