I hope you are all doing well despite all of the lockdown problems.
Currently, I started working again to dive deeper into deep learning and try to find a solution for the below problem. I hope somebody here can point me in the right direction.
I am helping a friend to automate the processing of PDF from the local harbor. As a lot of these documents have slightly different formats, a coding solution would be too much effort. Therefore, I thought I would try to come up with a NN architecture that would allow me to take the text from a certain document and extract the right information.
What I want to do is to train an NLP NN to classify text as to belong to one or the other category (whether an input is an address or a product category) and in the next step, I wanted to use the location of the word in the document and to estimate the distance the word has to the nearest term of interest (maybe a bit like a k nearest neighbor approach wherein the k the word of interest are or the key category terms).
My question to you is whether you know of any neural nets that can combine NLP and continuous variables. (I have come across the option to include categorical variables as additional embeddings, but I do not think this would work for here).
Another potential solution would be to have several NNs back to back and use the output from the one before. Still, I would ideally like an architecture that can make sense of this in one go.
I highly appreciate any feedback or helpful comments.
If someone can point me in the right direction, I would highly appreciate it!
Have a great holiday.