Searching text inside an Image

I am trying to conceptualize a system where one can give some text/string and using that string one can search documents, except the documents are in jpeg format and they contain large texts inside it. How can one design such end to end DL system that takes a text as input and search through the database of documents where these documents are images which contains texts?

If anyone could suggest some ways to design and train such model/models and give suggestions on some approach to design this system, that would be really helpful.