Hey Fast.AI Family!
I wanted to ask for your guidance / advice on a machine learning problem. I work for a staffing company and I’ve been tasked to build a model that ranks candidates against job postings. The system will largely be used by recruiters to guide their candidate engagement, and the overall goal is something somewhere between a search relevancy model and a recommender system. I think there is a huge opportunity for reducing inefficiencies in the hiring process and hopefully promoting diversity and a meritocracy, but there is also clearly some risk here.
The project will mostly rely on text/NLP data coming from resumes/CVs and job descriptions, and I’ve looked over a few papers with some really interesting deep learning approaches to solving this sort of problem, but I have a concern. Most of the proposed solutions involve embedding the queries and documents into some semantic space, and I know from some of Rachel’s great work that there is a real tendency/risk for word embeddings / semantic representations to encode biases. I want to avoid building a racist/sexist/discriminatory system!
I have a few concerns listed below, but I’m sure I’m not thinking of everything. I was hoping to get some guidance / advice from the community on what I should be careful of and things to avoid / look for
- Using transfer learning / open-source word embeddings with bias encoded (Transfer learning is almost certainly necessary for this problem given the size of our data, but has some risks in my opinion!)
- Creating a negative feedback loop if the system is purely tailored to click-through rates of recruiters (i.e. Reinforcing “bad behaviour” or avoiding diversity of thought)
- Unintentionally encoding bias in the model (i.e. Do different groups use different language patterns in their CVs, etc.)
- Missing the opportunity for a holistic view (i.e. Looking at more than just years of experience and skills… it’s like Jeremy talks about getting domain experts into deep learning instead of trying to transfer domain knowledge just to ML folks).
- Providing model interpretability
I hope that makes sense and I’m thinking about this the right way!
Thanks so much for your help!