Project idea: Resume parser

dipam7 · March 1, 2019, 4:39pm

I want to work on a project to automatically parse resumes. I don’t have an idea about text mining though. Can someone tell me a good way to approach the project? Especially how to segregate the resume into various sections. I am assuming the resumes are in pdf format and I will be using python library tika to convert it from pdf to text. Any suggestions about the project would be welcome

xjdeng · March 1, 2019, 6:08pm

I’ve done a similar project at work (which I obviously can’t share the details about) but I can give you some pointers:

You can use PDFminer.six to read the resumes if they are OCR’d. This will give you the coordinate boxes for where each letter is. It can be messy but certainly doable for you to piece the letters into words using the coordinates.

If the resume isn’t OCR ready, you’ll need to use Tesseract to extract the text from it. You can choose either plain text or in XML format which the latter might provide additional info like where each character is and how big it is, and you can infer important sections from this.

I’m not sure how deep learning would be helpful here. You might see common resume formats and maybe can use a DL classifier to distinguish them then send them to the respective postprocessor which parses the info out.

dipam7 · March 2, 2019, 7:47am

Thank you for your response, I will give PDFminer.six a try. The text extraction part is not actually an issue. The main problem is how to efficiently divide the text into various sections of the resume so we can give it to a different processor for further processing. Parsing the full text at once may give problems like how to differentiate dates associated with work experience with those associated with education.

ATIA · May 18, 2019, 11:15pm

Hi , did y find a solution for that problem ?im working on same subject … and im facing the same issue : extract information from unstructerecd text (resume ) !

Amruta · July 23, 2019, 12:24pm

You can try resume parser or can take a reference from Github.

bhd1 · September 5, 2023, 7:49pm

Have you tried Big Help Desk’s resume parsing api service. Very accurate and easy to setup, they have sample code for many different langauges.