Share your work here ✅

I am working on an object verification problem that need to add new categories regularly. (similar to face verification, we first show the identity card and a model will verify if the face is matching)

I found 2 approach from here how-to-add-a-new-category-to-a-deep-learning-model . It says that we can retrain the model with old weight and adding this new category or using Content-based image retrieval. It means that we base on the last layer feature vector to decide the category (by calculating their hamming distance and euclidean distance). The paper of this technique you can find it here: Deep Learning of Binary Hash Codes for Fast Image Retrieval

Below is the image of concept of this technique:

I have tested the technique with Mnist data set, from number 0 to number 7 only. Number 8 and 9 will be add later. I use only the binarizing code of the last layer in this moment. I tested with number 9 and the results is quite ok. Every other numbers have low similarity (<60%), except number 7, it has 90% similarity with number 9. I will continue to test with Euclidean distance rather than this binary hamming distance.

I have written a blog about this here: Object Verification with Deep Learning of Binary Hash Codes
Source code is quite messy but if you are interested, you can find it here

I am very appreciated if someone can suggest me some techniques to deal with this problem. Thank you in advance


All - I took a stab at the Amazon Bin Challenge i.e. count number of items in Amazon Pods:

Here is the github:

I have a python script in the repo that will download a subset of the data for you in case you want to try this challenge.

Here are initial results:



Hi Matthew,

That is the “data wrangling” part - prepping raw data into a format more useful for analysis or ML models. How you slice and dice depends on what you are trying to do, and the design of your model. It looks like you may want to construct tabular data and perhaps look at relationships of other dimensions based on day of the week and/or time.

If that’s the case, you might be interested in the concept of “embeddings for categorical variables”. Here is a useful blog post and workshop video from Rachel on the subject:

wow, thanks! Will take a look and ask more questions!

I don’t think our normal convnets will work well for that. You’ll need to use object detection, which we should be covering in lesson 6.


Ah ok thanks thats good to know!

Regarding music spectograms, can’t we do something like this to perform data augmentation

  • We can split the thirty seconds(let’s say 30 sec audio files) into 10 chunks, 3 seconds each… Each window of the song will be tagged with the same genre of the original thirty seconds.
  • If a song had rock as genre, then all 10 windows that came out of the splitting will have rock as genre. This trick gives 10x more data than the original dataset.

That sounds like a good plan. I haven’t tried it myself, but I’ve read some papers proposing that with success.

1 Like

A little late to the game but here is my experiment with creating and deploying a cuisine-type classifier (5 cuisines, with ~200 images each in train+valid).


  • Surprisingly only resnet50 was able to bring the accuracy to >70%. Struggled with resnet32 to get accuracy beyond 60% (tried changing lr, batchsize, image_size, data augmentations, ran more epochs until I saw some small sign of over-fitting)
  • Increasing image size from 224 to 299 and using flip_vert=True in data aug and using resnet50, was able to bring the accuracy to 70%


Next Steps:

  • add other popular cuisines (french, italian, etc) and a ‘other’ category for non-food images and check the model performance
  • web app: add some default sample images that the user can select and test, ability to pass a URL, add GradCAM activation maps, add class prob as a graph

If you haven’t already check out imgcat and imgls. Makes working with image files on a remote machine from a TERMINAL so much easier. See


What do you mean by object verification? Are you saying you can have say 5 pre-trained categories A, B, C, D, E and given a new object you want to be able to classify it as A, B, C, D, E or neither? The method you are citing is an instance-based method and is used to find similar objects by comparing distances between a set of feature vectors extracted from the image. I am working on an image-similarity problem so maybe we have the same use case.

Yeah, kind similar. I want to check if an object with identity A is really A (double check). For example I have a package of apple and want to check if all items inside are apple and not mixing up with orange

But from time to time, I need to add new categories in my model as well, (F, G, … rather than just A B C D E) and because the new categories will be added regularly then I don’t want to retrain the model very often.

This method can be used to compare just the last feature vector to verify the identity (apparently, we should fine tune the model to represent the characteristic of this data set)

Fortunately, I think we are working in the very similar problem. I’m thinking on how to create a reference feature vector with the training set (at first I will just take the mean) and calculate the Euclidean distance. I will update the result when I finish but very appreciated if someone can shed some lights about this problem :smiley:

1 Like

So really you don’t trust the label of A and want to verify it.

I would try an unsupervised learner such as a CNN autoencoder and use a bottleneck layer as the feature vector for each image during inference.

First you will need to train it on A to E categories of images to produce a set of filters that can represent your dataset and then you can run inference on each image to obtain the feature vector.

Store these feature vectors in an approximate nearest neighbour search package like Annoy with the label.

Then given a new image you can run inference on it to extract the feature vector then search the AKNN for the K nearest images and obtain their labels and distances. From here you can rank which label is most likely given K nearest images and distances.

How to deal with new image categories?

For new categories you will want at least 3 images in the category. Run the images from the new category through the network using inference only to extract the feature vectors. Then add the feature vectors to the AKNN. Then for each image in the new category search for the K nearest images. If your algorithm ranks the 2 other images in the category as highest match you should be good to use the current network without fine-tuning. If there is a mismatch then you can fine-tune the autoencoder and recompute the feature vectors for the existing images.

There are probably more complicated things you could do to avoid having to fine-tune one network and recompute feature vectors such as have a seperate autoencoder per category.


Thank you so much for your suggestion @maral , it is very helpful for me. I was thinking about using autoencoder also but because I don’t have many experience on it so I first tried with an easy method that I know :smiley: . I will read carefully your post and try it out.

P/s: By the way, do you have some example codes or a posts that I can read about CNN autoencoder ?

Hi, Do you have you notebook or gist available somewhere? would love to see audio to image conversion part.

Excellent one. i am doing a similar problem and stuck while creating web app. Could you please share some lights how you have deployed this as web app?


1 Like

Hi @hkristen,
Great work and thanks for the link to the dataset.
I am also working on a similar model. I created a small model to classify plants from the picture of leaves. As starting point I trained model with 3 species with 94% accuracy. The dataset was images manually downloaded from google, with around 200 samples for each category.
Did you further worked on this project

1 Like

See here:

I am also interested in autoencoders, but for a completely different application.
If you want we can join forces for a fastai autoencoder setup?

Kind regards


Hi. I’m still cleaning the notebook, but you can check this notebook from Bashir which I used as a reference.

I just followed the deployment guide on course docs page: