Following a discussion with my manager at work, I would love to hear your thoughts.
Since I’m a “follower” of fast.ai, and as explained in the lessons, for each NLP task I do text pre-processing: Separates words, delete char repetition, lowercase, etc.
My manager claims that all this is unnecessary, because large and strong model can learn everything.
So my question is:
Is early processing a constraint that we perform due to a lack of data / computational power, or is it an essential part of DL?
At work, we identify OCR text and want to create a classification model.
Should the input be raw text by converting the words into sentences, or each word + coordinates and assumption that a model will learn the sentence structure from there?