Hi folks -
I am currently going through part1 2019 course. Many thanks to @jeremy and team for putting out such a high quality course for everyone to learn from !!
I completed lesson 3 on image segmentation camvid data set. While going through this lesson, I started thinking of a simple project using my custom data set. Basically I want to extract text from images.
I checked out CoCo data set but not sure if that is what I should start with or I need to handcraft data set of my own.
I was thinking maybe I need example bounding boxes on images like so:
I think that approach would get you the ability to locate text sub images and their position which you could then pass through to a more dedicated text network. Very interesting problem to work on.
Would also be interesting to see if it could learn without the bounding boxes at all. Just start with one word somewhere on the image randomly and have that be the label. Then move on to multiline text. Assuming you didn’t care about the location of the text.
Thanks for the linked article. This is a bit beyond my current experience with neural nets. Hopefully when I complete part 1 I might be able to take a stab at it.
Thanks @blissweb
I was thinking along the same lines. Extract the bounding box for the image. Crop this bounding box as a separate image and feed the text located sub-image to a text recognition module such as py tesseract to extract the text.
You can use dynamic u-net for getting the activation map of the text area in the image. You can use ICDAR dataset to train on. Let me know if you need any help.
There’s a decent number of threads on this topic, most pointing to the FastAI 2018 Part 2 course on object detection. I’m also working through the examples in 2019 Part 1 - Course V3 and would love any FastAI v1 or PyTorch examples people may have.