Segmentic segmentation of document

(Slavica Tomovic) #1

I would like to make a program for automated invoice processing. Could semantic segmentation be useful for such project? I am thinking about making segmentation masks that describe the invoice format, e.g. header, table, footer, supplier, etc. After that, OCR and RNN could be used on image segments.

Can I rely on unet learner to recognize a document layout? I am not sure how much it can learn from text images? Do you have some other approaches on mind?

Thanks in advance.


(julian) #2

You can also treat it as object detection and just get the bounding boxes for header, tables, footer and such. Not sure what is better though ^^. I think you can find more info looking up document semantic structure extraction.


(hari rajeev) #3

you could use either object detection (YOLO) or semantic segmentation (UNET). If the objects are all squares / rectangles try YOLO first. I know that both these techniques works on document layouts.

You could also refer to