Document Layout Analysis datasets and code repos

(hari rajeev) #1

Hello all,

Starting a thread specific to collate information on datasets and code repositories that can help with “document layout analysis”.

As per wikipedia : document layout analysis is the process of identifying and categorizing the regions of interest in the scanned image of a text document. A reading system requires the segmentation of text zones from non-textual ones and the arrangement in their correct reading order.

Digitization of documents , document layout analysis etc are major real world problems in banking domain.

Please let know if you know of any good dataset / code repository for document layout analysis .

Prima research provides few layout datasets that you can login and request from their website . https://www.primaresearch.org/datasets

I will update the thread as and when i get more information.

thanks
Hari

3 Likes

(Manimaran) #2

Another dataset for document layout analysis

1 Like