I have been trying to create models for analysing images of printed text. As a starting point I have been taking rotated images and creating a model to make them the correct orientation for reading. I downloaded some images of normally aligned text and then applied rotations to create a training and validation set.
As a human this is simple. I can instantly see the pattern of rows of letters with each row separated by whitespace. Yet I cannot get a CNN to converge at all. I have tried different architectures and different sized CNNs with zero success.
I know there are other models that have been used to do this. However can a simple CNN be configured to align text? If not why not? And if it can then how do I configure it?