Ethical Challenge / Bias Avoidance

Hey Fast.AI Family!

I wanted to ask for your guidance / advice on a machine learning problem. I work for a staffing company and I’ve been tasked to build a model that ranks candidates against job postings. The system will largely be used by recruiters to guide their candidate engagement, and the overall goal is something somewhere between a search relevancy model and a recommender system. I think there is a huge opportunity for reducing inefficiencies in the hiring process and hopefully promoting diversity and a meritocracy, but there is also clearly some risk here.

The project will mostly rely on text/NLP data coming from resumes/CVs and job descriptions, and I’ve looked over a few papers with some really interesting deep learning approaches to solving this sort of problem, but I have a concern. Most of the proposed solutions involve embedding the queries and documents into some semantic space, and I know from some of Rachel’s great work that there is a real tendency/risk for word embeddings / semantic representations to encode biases. I want to avoid building a racist/sexist/discriminatory system!

I have a few concerns listed below, but I’m sure I’m not thinking of everything. I was hoping to get some guidance / advice from the community on what I should be careful of and things to avoid / look for :slight_smile:


  1. Using transfer learning / open-source word embeddings with bias encoded (Transfer learning is almost certainly necessary for this problem given the size of our data, but has some risks in my opinion!)
  2. Creating a negative feedback loop if the system is purely tailored to click-through rates of recruiters (i.e. Reinforcing “bad behaviour” or avoiding diversity of thought)
  3. Unintentionally encoding bias in the model (i.e. Do different groups use different language patterns in their CVs, etc.)
  4. Missing the opportunity for a holistic view (i.e. Looking at more than just years of experience and skills… it’s like Jeremy talks about getting domain experts into deep learning instead of trying to transfer domain knowledge just to ML folks).
  5. Providing model interpretability

I hope that makes sense and I’m thinking about this the right way!

Thanks so much for your help!



This is an extremely interesting problem. I’m in a similar space. The one thing that will allow you to test many of your hypotheses is by collecting those demographic variables and examining the impact of the outcomes against common EEOC guidelines like effect sizes and 4/5ths.

I’ve always wondered if some of the inherent relationships amongst how close words sit to one another in n-dimensional space would play out on resumes. Common examples are doctor > man nurse > woman. My thoughts would be as long as you were sticking to the content of the resume and leaving out things like name, address, etc. Some of that should be avoided, but I don’t really know for sure.

Another important thing to think about is what you are trying to predict. How are you going to determine what makes one resume rank higher for a job posting compared to another? Often times what I’ve found is that there is inherent bias baked into the criterion/outcome itself that is often overlooked. For example if the criterion is who gets hired and you inadvertently hire more males than females you’ve just taught the algorithm that it is ok to prefer males or females, so if anything in the data inadvertently points to “maleness” vs. “femaleness” a good algo should find that an exploit it because it’s an important variable when it comes to predicting your criterion of interest.