Looking for help improving accuracy on tiny data set of paper currency images

tleyden · November 14, 2017, 6:27pm

I created a very small data set by downloading a few images of three different international currencies from Google Images. It’s definitely not the greatest data set in the world, but I was expecting to get better accuracy than what I’m getting (it would be an easy problem for a human). Typically I get about 80% accuracy, but it does vary a bit.

Challenges of this data set

There is very little training and validation data. For example for the Hong Kong Dollars, there are only 12 training images.
There are bills in the validation set from years that have no equivalent in the training set. For example for the Hong Kong bills, in the validation set there are red colored bills but none of the bills in the training set have this style/color.
There are very different image sizes ranging from 1000 (w) x 500 (h) to 450 (w) x 200 (h)

One thing that I thought would improve things a lot, bit didn’t, was to switch from center cropping to no-cropping. With center cropping, it was transforming images like this:

Bvm4kZfIQAMmI4y

to images like this:

cc_out978750

Whereas with no cropping it would squash the images down to images like this:

no_crop_159892

Which seemed like it would be a lot better, since at least it preserves the word “Indonesia”. So far it didn’t seem to improve the accuracy at all though.

Also, I didn’t seem to get any improvement using the data augmentation (I tried transforms_basic and transforms_side_on). I didn’t try test time augmentation (TTA) since I don’t have a test set, and I’m a bit confused on whether I actually need one or not.

The data set

http://deep-learning-for-coders.s3.amazonaws.com/world-currencies/world-currencies.tar.gz (personal s3 bucket)

Latest Jupyter Notebook snapshot

https://github.com/tleyden/deep-learning-for-coders/blob/0174278061b89988a0cc725be7feefa259e912f9/world-currencies/CurrencyDetector.ipynb

(if you click “show original” it should take you to github with a rendered snapshot of the jupyter notebook)

metachi · November 14, 2017, 9:24pm

I think more data is extremely rarely a bad thing. I would say if you can get more data, do that.

As for your cropping problem…squeezing, as I recently discovered is the default behavior of Keras, is generally not a good idea. Are you doing random cropping? If not try using random cropping and TTA.
Jeremy uses Test Time Augmentation (TTA) to account for the cropping missing pertinent parts of the image.

ramesh · November 14, 2017, 10:40pm

Your model is definitely learning, but it’s overfitting to the training dataset - May be because each of the notes (particularly the HK / Indonesian ones) are fairly unique?

As you can see the Training loss kept going down -

Also the confusion matrix - shows that Indian Rupee are fairly distinct, so you may not need more of it, but may need lot more Indonesian even HK Dollar

Finally, I reduced the resolution to 300x300 image by preserving the aspect ratio. You can grab it from - https://s3-us-west-2.amazonaws.com/open-datasets/world-currencies-300x300.zip

jeremy · November 15, 2017, 5:45am

If all of your bills are nicely horizontally aligned, and the right way around, then the regular data augmentation definitely won’t help (as you noticed), since the augmentation isn’t creating variation that you actually see in the validation set.

tleyden · November 15, 2017, 3:28pm

Thanks for the tips @ramesh! That confusion matrix is really useful for pinpointing the strengths and weaknesses in the predictions.

I will give that 300 x 300 data set a shot.

tleyden · November 15, 2017, 3:30pm

@metachi I will definitely try random cropping, that sounds promising. Do you happen to know if it’s one random crop per image or will it generate multiple croppings per image? I feel like if it’s only one, it might miss some very distinctive parts of the image.

I’m not using Test Time Augmentation, and in fact I don’t even have a test set. I didn’t fully understand the need for one. Can TTA still be used?

tleyden · November 15, 2017, 3:34pm

@jeremy that makes sense, and probably highlights how unrealistic this data set is. I’m guessing in most “real world” image data sets data augmentation would be beneficial.

tleyden · November 15, 2017, 3:43pm

Could the color channels be making things worse?

For example looking at this hong kong dollar from the training set:

hk-10f-old

versus this HK dollar from the validation set:

it has the same emblem with the Unicorn and the Lion over the picture of the ships sailing in, but with different color schemes. I’m starting to wonder if normalizing these to grayscale would improve the results, because it seems like the color channels might just be making the problem more complicated for the algorithm.

I’ll try it and see if it improves accuracy. Does the fast.ai library have a way to specify to treat all images as grayscale?

yinterian · November 15, 2017, 4:34pm

You can save images in grayscale before you start.

ramesh · November 15, 2017, 4:35pm

It’s a pre-processing task to make RGB into Grayscale. Python (Numpy / PIL) has number of functions to do it. Take a look at this SO Post - https://stackoverflow.com/questions/12201577/how-can-i-convert-an-rgb-image-into-grayscale-in-python

tleyden · November 15, 2017, 4:47pm

I tried the 300 x 300 image set, but the accuracy ended up going down a bit to 70%.

https://github.com/tleyden/deep-learning-for-coders/blob/b7f18c831371f7a874f6cce95017dd0fb98ff26a/world-currencies/CurrencyDetector.ipynb

I noticed something strange going on when a HK dollar showed up under “Most correct Indonesian Rupiahs”. Still chasing that down.

ramesh · November 15, 2017, 4:50pm

Yeah…I think it’s a case of needing more examples since there’s lot of variety in the bills from same country.

memetzgz · November 15, 2017, 8:48pm

@tleyden This is a great project! Very fun. I am thinking it might be easier to do this with coins, they are equal width/height at least. But bills are more fun

tleyden · November 26, 2017, 6:51pm

Is there an easy way to visualize the filters at the various layers? For example, given the fine tuned network which is giving 70% accuracy on the currency data set, generate something like this:

I’m wondering if I’d see filters that had components of the currency bills like:

12 PM

or if there would still be cat and dog faces and all the other high level images from ImageNet?

Given the low accuracy, I’m guessing it would be the latter.

To try to fix that, how could deeper layer weights completely be randomized with a resnet34 model? Eg, try to just keep the weights that correspond to lower level features from layer 1 and layer 2 like:

and then retrain all of the other layers from scratch, and hopefully after that training all of the later layers would represent higher level features of the currencies.

jeremy · November 26, 2017, 10:05pm

There’s this: https://github.com/yosinski/deep-visualization-toolbox . But there’s not a lot of stuff around unfortunately in this area

thiago · November 26, 2017, 11:09pm

Such a great project!

Have you tried with grayscale images?

Moody · January 13, 2018, 2:06pm

@tleyden I think you can get more images for Hong Kong dollars from Google images.

There are three banks, named: Standard Chartered Bank, The Hong Kong and Shanghai Banking Corporation (also known as HSBC) and Bank of China. They issue bank notes in six-denomination each (ie HKD20, HKD50, HKD100, HKD500 and HKD1,000).

The government of Hong Kong Special Administration Region (HKSAR) issues only HKD10 bank notes. So, you should have minimum 38 unique images to train your model. (3 banks x 6 denominations x 2 sides + 1 HKSAR x 2 sides).

Sample bank notes issued by HSBC (6 denominations with both sides)

Sample bank note issued by HKSAR

Please note below symbol was used for previous Hong Kong government (prior to 1997!). Your dataset may not be up-to-date.

I grew up in Hong Kong. Hopefully, my “domain” knowledge can help you.