Reading Text from images using convolution

satish860 · June 12, 2017, 5:09pm

I have finished lesson 2 of MOOC and I am looking for creation which can read the text from the images. I believe CNNs are powerful enough to read the same .

I would like to know if there are any pre trained model which can do the same.

karthik_k314 · June 19, 2017, 8:05pm

You will need to look at an attention OCR model for this. Google put out a pre-trained model that does just that.

https://github.com/tensorflow/models/tree/master/attention_ocr

Take a look at this.

arnaud · July 21, 2017, 1:38pm

Hi everyone

Not to be too lazy but… Has someone, by chance, already digged into that attention_ocr model released by Tensorflow that @karthik_k314 mentioned ? Would be great to know before going into the wild, because as such it doesn’t look as “easy to use”, well at least to me

Thanks a lot,
A

alexandrecc · July 21, 2017, 2:01pm

I think the attention OCR is focused on street names. The more generic universal text OCR is Tesseract also owned by Google : https://github.com/tesseract-ocr

niyazpk · July 26, 2017, 8:20pm

Whether Tesseract will work for you depends on the type of image you are working with.

Scans of documents? Tesseract may work well.
Photographs of street signs? Tesseract will fail most probably.

See: https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality

karthik_k314 · July 27, 2017, 10:48am

If you do want to use the attention_ocr model on your own data and deploy the model you are better off with DeepDetect https://deepdetect.com/

I’ve found it to be FAAAR better than Tensorflow Serving. DeepDetect has made it far easier for me to deploy models than TF Serving.

arnaud · July 27, 2017, 12:17pm

Hi all,

First, thanks for your answers.

@alexandrecc @niyazpk I’m indeed trying to detect text in pictures. They’re not street signs, but rather products from daily life (e.g a bottle) or things like that. Thus I’m not sure tesseract is the solution…

@karthik_k314 I’ve been through deepdetect and they have spectacularly easy to consume Docker images for localization and classification. I couldn’t find any ready-to-use text recognition model though. Did you go through it ?

Google’s Cloud API for Text Detection works really well… I don’t know what they use