I’m trying to read 100K files (office type docs)
I’m planning to use Apache Tika to read the files.
In order for me to have a dictionary I will have to read all 100K files and then create a dictionary.
The folder structure will be part of the dictionary too.
Can you shed some light on the best way to achieved this?