I worked through the PyTorch version of the first course (Thanks, Jeremy!!!) and am now working on the second course. This is so great!!!
For a practical application of what I 've learned, I want to build a piece of code to read number plates (of European) cars. The number plates are extracted with a Yolo Object Detector from the images of the cars (that works well), and then should be read. The extraction works well.
I have tried to feed the number plates into a classical segmentation algorithm (based on filters from OpenCV, being a conventional feature extractor) to segment the letters, and then fed each single letter into a classifying network (similar to letter classification for MNIST).
The problem is the conventional segmentation algorithm which fails if the plates are dirty or shadowed. Is there an existing solution to read words from images?
I tried Tesseract also but this did not work well, since the plates can contain logos etc, that should not be read.
Can a kind soul recommend a paper or resource how to read small patches of text (license plates, street signs,…) end to end. They are localised, so they do not be read anywhere in the image, only in a very localised part.
I can not follow the classical approach of reading words against a dictionary (like Tesseract does) since there is no dictionary for the license plates :-). I have made searches, but could not come up with a good solution.
Object detection, what am I actually detecting?: I can see that there are basically two approaches. One is to detect the license plate , normalize it and ocr the content, the other is to use a model that will detect and read the content in one go. I am not too sure how this kind of model works, as most models around are for categorizing or object detection and the content is not really a defined object. Any input on this will be greatly appreciated.
Can I actually train the model to recognize just characters in specific font? all license plates use the same font here, so I though it might be good to simply train an object detection network to detect A-Z, 0-9 in this font. This would be easier I guess, and I think I can also do it with small model (like Mobilenet SSD) as the final network will run on rpi 3… will this approach work?
Anyways, it turned out to be a more complicated problem then I thought it would be… I should have guessed this is the case from the small amount of detectors floating in the net…
There is one perplexing example that I wasn’t able to run yet due to problems with outdated code and memory https://github.com/matthewearl/deep-anpr
This one has a different approach. It trains the net on augmented automatically generated plates and use the model to read the license plates directly from the photo. I Need to wrap my brain around how this actually works…
Thanks, but this is a detector. Detectors are pretty straight forward. You train on any object detection net with labled plates and it will find plates reasonably easily as they have very distinct features. However, I still didn’t work out how to actually have the net read the plate… Still, interesting reading.
The hand labeling was the simplest and quickest part of all
It depends on the size distribution of your plates. If all plates are similar sized and the photos taken under similar conditions and similar equipment, then a few 100 labels seem to be good enough. I can do 100-200 plates an hour :-). For labeling software, you can look at prodi.gy. Transfer learning from Coco or so is also helpful and reduces the number of plates
There are some papers on reading house numbers on Google Street View Images, that could be a start if you try the other approach. See “Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks” by Ian Goodfellow et al.
I trained a CNN+RNN for simple OCR a while ago, that was not terrible. If you have the license plate font, it should be easy to create a synthetic training dataset with the python cairo module or so. I used CTC as a loss.
Thanks Ernest. Alas, the images are from dashboard camera, meaning the sizes and positioning are different. This is not going to be easy…
I have looked at the street view paper. It is currently above my level. Working on that
I have looked at the supervisly posts. They seems to be more advertising than actual working models. Yes, the deep-anpr turned out to be a disappointment as well. After struggling for a while with problems getting it to work it ends up being very limited.
And i thought this was a great first project!
I guess I’ll go with detection followed by ocr. Ocams razor… I’ll get back to this though.
What kind of dashboard cam are you using? I tried the GoPro iin the car, but was not very successfull. The image quality was not too good, and GoPro has a wide angle lens that makes the plates quite small. I ended up using a GoPro with a third party zoom lens, that made it better.
Its mot mine, it’s a friends camera. Its is a dedicated dvr and has really good quality. However… Yes, wide len, small and large plates.
Going to train the ocr network first. Already have the augmented set of chars. Now just need some kidless time to train the network. I think I’ll start with a network that has been pretrained on mnist
Just adding some more information, in case people look for the same. There is a pretty cryptic example in the keras repository that was the inspiration for the deep-anpr github.
However, it is a bit clearer and there is a stckoverflow discussion of it here which seems to give references to the theory behind this strange kind of networks. The training script runs for under an hour on k80 Google cloud instance. I still need to check how well it predicts. I ran the example unmodified so i am not hoping for much as it is pretty small training set with different fonts etc. Hopefully on only the license plates font it will do better.
BTW, this one doesnt seem too bad, it can detect and read a lot of oblique plates too:
it misses out on line vans and some other big cars. also sometimes it detects the texts on the side of the car or sth as the plate. over all im not a fan of his plate-detection part as I think its too specifically engineered and comes with a handful of assumptions. I’d love to know other’s thoughts on it too.