Text extraction from images

Hi folks -
I am currently going through part1 2019 course. Many thanks to @jeremy and team for putting out such a high quality course for everyone to learn from !!

I completed lesson 3 on image segmentation camvid data set. While going through this lesson, I started thinking of a simple project using my custom data set. Basically I want to extract text from images.
I checked out CoCo data set but not sure if that is what I should start with or I need to handcraft data set of my own.

I was thinking maybe I need example bounding boxes on images like so:

Also has anyone tried extracting text from images? If so what type of data set have they used.

Any suggestions/pointers on how to go about doing this will be very helpful.
Many thanks

I know CTC methods are used for text recognition


I think that approach would get you the ability to locate text sub images and their position which you could then pass through to a more dedicated text network. Very interesting problem to work on.

Would also be interesting to see if it could learn without the bounding boxes at all. Just start with one word somewhere on the image randomly and have that be the label. Then move on to multiline text. Assuming you didn’t care about the location of the text.

Thanks for the linked article. This is a bit beyond my current experience with neural nets. Hopefully when I complete part 1 I might be able to take a stab at it.

Thanks @blissweb
I was thinking along the same lines. Extract the bounding box for the image. Crop this bounding box as a separate image and feed the text located sub-image to a text recognition module such as py tesseract to extract the text.

This is one of OCR topic, search keyword detect text in the wild

You can use dynamic u-net for getting the activation map of the text area in the image. You can use ICDAR dataset to train on. Let me know if you need any help.

Hi Ashutosh -
I am interested in taking a stab at it. Do you have any notebook/script with the u-net and ICDAR data that I can look at?


Some of the errors are there in the mask. Otherwise, implementation will be the same. I will update you with the latest result.

1 Like

@addamit Did you ever get any further with this? I’m curious about trying something similar in my project and wondering if you ever came across this: https://github.com/songdejia/EAST/blob/master/model.py. The paper is here (https://arxiv.org/pdf/1704.03155.pdf).

I’ve never built my own model but this seems like it could be an interesting thing to experiment with?

There’s a decent number of threads on this topic, most pointing to the FastAI 2018 Part 2 course on object detection. I’m also working through the examples in 2019 Part 1 - Course V3 and would love any FastAI v1 or PyTorch examples people may have.

Thanks for sharing your nb. I am unable to find the dataset, can you help me out with this?


Thanks, Ashut,but I am talking about text segmentation dataset, not for cars…

I am trying to implement SRGAN in fastai for text, but I am getting complete black screen as output.

I do not know what is going wrong.I am able to start training, but output is always a black screen.