It looks like I’d be able to make a classifier for identifying individual characters fairly easily. It could possibly be fine tuned for my data set (it’s all computerized text, no handwriting; in a particular set of fonts used in DICOM).
I would think the other crucial thing to solve, is how to extract text regions, and split them up into characters (which are unknown), which would then be passed into the classifier for identification.
Any nice way to use fastai to extract the text regions/individual characters?
I have tried it, it works pretty well if the data is consistent (all text same size, same font) but poorly in other circumstances. For instance, sometimes text is overlaid on irregular backgrounds (the contents of a CT scan or xray) and it doesn’t get good results. Strangely enough the cloud variations handle that well.
I noticed the other thing you linked, Textract, I will check that out.
Still, I want to give fastai a go for the learning experience. I found this similar thread interesting, looks like I would need to use segmentation to extract the characters. Does anyone know how to do this? Does it require some ground-truth data, like in the camvid example in the lessons?