Thanks for your response!
The ultimate objective is to get text from the image. An approach I have is using pytesseract directly on pre-processed images, but the result is not accurate and would need a lot post processing work.
My ideal solution is to get character segmentation accurately first (using either computer vision tools or deep learning) and then use CNN to recognize the character. Initial thought is making all segmented characters normalized with same height and width, skewness adjusted and then I can label them manually if necessary. But as a premise, for the segmentation part, if I have to deal it with neural nets, the training data is a problem as I’m not sure how much time I would spend to generate my own annotations.
I also thought about using pytesseract directly and then focusing on post-processing with inaccurate text, but I haven’t do many research on RNN LSTM.