Why was your job application rejected: Bias in Recruitment Algorithms? (Part 2)

This guest post was contributed by Merve Hickok (SHRM-SCP), founder of AIethicist.org. It is part 2 of a 2-part series on bias in recruitment algorithms. Read part 1 here.

SCREENING:

So let’s assume your application was one of the ones ranked high in the matching and sourcing platform, and the recruiter clicked your name to process you in to the next stage where you are screened against the company’s preferred criteria. Whether it is through hard-coded questions and filters built into the system, or machine learning algorithms which make decisions, the screening process helps to reduce the number of applications as it goes through your CV / resume and picks up the skills and information (degree, GPA, years of experience, fluency in spoken or technical languages, etc). Whatever the software was able to read (or parse) from your CV, the data points are then matched with the desired points for the specific role. The candidates who have matching points may then again be ranked according to the degree or percentage of match. However, the bigger bias issues in this stage have to do with data out of which the algorithm was created and what kind of a model makes the predictions.

One way to create the datasets by the AI vendors is to scrape data online or buy commercially available datasets – which means a lot of the vendors are using the same data sets. Volume does not mean quality, however. In 2016, Microsoft and Boston University researchers revealed that the Word2Vec (publicly available algorithmic model built on millions of words scraped from online Google News articles, which computer scientists commonly use to analyze word associations) model trained itself on gender stereotypes existing in online news sources (Bolukbasi, 2016). The other finding from the study was these biased word associations were overwhelmingly job related. For example, Man Is to Computer Programmer as Woman is to Homemaker. The data used in training might not have a fair representation in the first place and have embedded bias and imbalances in it, or even if it is perfectly clean it might not be representative of the population you are targeting. In other words, the dataset collected in US might not make a sense if you are a recruiter trying to use this algorithm in Southeast Asia.

Another approach to create the dataset and then the criteria upon which the model is based can be used to look at an organization’s current and past workforce (and/or applicant pool) and determine the success stories and create a model (baseline criteria) based on what “worked” in the past. This is a customized model for the specific employer. However, defining what “worked” or what defines a “successful” employee is also a biased process in itself. What does the client value? Sales numbers? Cultural fit? Retention? And crucially, what data does the client have? (Raghavan et al, 2020)

“Cultural fit” is a term which is used so frequently we forget it is a subjective measure. It is a better way of saying we will hire people who are like us, or we will not step out of our comfort zones. However, we do not question the possibility the culture might have kept some diverse talent outside the equation; or what if the culture has kept some of its own employees at bottom due to biases within the organization. The performance management evaluations are themselves be biased and subjective if not structured properly with objectively measurable criteria. Long tenure in an organization is usually considered another metric of success. However, what if the employee has been with the company for more than 10 years because he/she did not want to learn new things and was content with doing the same thing over and over again, or did not get any outside offers all that time because there was not anything particularly successful to grab attention. Usually a successful long tenure in a company means the employee has been promoted during the time or has taken on more responsibility, which is absolutely a sign of success. So, a basic calculation looking at time in an organization without looking at the more nuanced changes should not be the criteria for success. In the same token, gaps in employment should also not be held against an applicant. The applicant might have a disability or another circumstance which required him/her to take time off from work. The machine learning algorithm Amazon had built for its own hiring purposes using its own job applicant data since 2014 had to be scraped by the company when it realized the algorithm was biased. The models were trained to vet applicants by observing patterns in resumes submitted to the company over a 10-year period. However, the database was a reflection of the heavy male dominance across the tech industry. In effect, Amazon’s system taught itself male candidates were preferable. The algorithm penalized resumes which included the word “women’s,” as in “women’s chess club captain.” To the company’s credit, it did not keep pushing the use of product when it noticed the bias despite all the investment made on the system. However, it is a good reminder and case study for us when looking at bias in dataset. We need to remember that even when sensitive/protected characteristics (like race, gender, age, etc.) are explicitly ignored in the model, there can still be some data points which can be proxies for these characteristics (zip code, college name, etc..), which can still reflect the same systematic injustices and bias in the dataset. Long story short, predictions based on historical data of a company for a customized tool can further deepen the underrepresentation of females, non-binary applicants, ethnic minorities, people with disabilities and so on – exactly the type of issue the company wishes to avoid or correct in the first place.

ASSESSMENT:

Assessment stage is where applicants are asked to go through different exercises to understand their fit for a certain role. In a traditional sense, the assessment step might include interviews, simulations, case studies, tests or games. The main types of algorithmic assessment tools are focused on facial, speech or emotion analysis during candidates’ interviews or gamified tests on the other. In their research of evaluating the claims and practices of 18 vendors of algorithmic pre-employment assessment, Raghavan (and et al, 2020) cite lack of publicly available information, and lack of information about the validity of these assessments as biggest obstacles to empirically characterizing industry practices. This holds true for most of the assessment algorithms used in the market today.

Inferred traits may not actually have any causal relationship with performance, and at worst, could be entirely circumstantial (Bogen and Rieke, 2018). In other words, the correlations which the algorithms found to build a model, or the traits which the developers built into coding may have nothing to do with a person’s success on the job. So not only we are faced with a black box when it comes to these algorithms (i.e. the workings of the algorithm is not understood or can be explained), but even if we had access to the code and the algorithm itself was explainable, the explanation might not necessarily mean anything.

As Reema Patel, head of public engagement at the Ada Lovelace Institute, puts it “There’s no data that demonstrates that facial recognition technology to profile people works, and effectively, what we’re looking at is a form of pseudoscience that has a potential risk of discriminating against disabled people” (Lee, 2019). The assessment may not work well for people with differences in facial features and expressions if they were not considered when gathering training data and evaluating models; body recognition systems may not work well for a person with disability characterized by body shape, posture, or mobility differences; or analysis tools which attempt to infer emotional state from prosodic features are likely to fail for speakers with atypical prosody, such as people with autism (Guo, Kamar, Vaughan, Wallach, Morris. 2019)

Put aside the fact 1 billion people, or 15% of the world’s population, experience some form of disability according to World Bank and the fact there is not enough work done to solve all the different biases this population faces, the algorithmic bias in assessment tools does not stop with only the those with disabilities.

EPIC filed a complaint with the FTC alleging that recruiting company HireVue has committed unfair and deceptive practices in violation of the FTC Act. use of micro-expression matching (analyzing the candidate’s facial expressions, their gestures, whether they’re making eye contact, their body language, their speaking speed and the candidate’s choice of words). Yes, HireVue is the most commonly cited example in this category, but it is far from being the only one. Micro-expression matching or analysis also works against those applicants whose native language is different than the language used in the tool; or the facial analysis systems struggle to read the faces of women with darker skin (Buolamwini and Gebru. 2018). As a result, the system either filters out all these candidates either as not fit for hiring, or erroneously flags their data as invalid outliers.

Vendors like Faception, a facial personality analytics tool, suggests their proprietary computer vision and machine learning technology can profile people and reveal their personality based only on their facial image; claiming they can tell if a person has a high IQ, or is more likely an academic researcher, or terrorist. I will constrain from myself from going in a deep dive argument of what sounds like phrenology, a Lombroso-ist approach and the whole unscientific and malevolent aspects of this approach. However, it does raise a red flag because this vendor also lists smart cities, recruitment, retail and insurance in its product verticals.

SOCIAL PROFILE AGGREGATION:

Let’s say a candidate has gone through all these stages and is shortlisted for a job offer. Despite the fact a number of states ban employers from looking at candidate’s social profiles to get more information, not all states or countries do. A number of algorithmic tools can now scrape all your social profiles and post on the internet and make recommendations about you to employers by classifying you in certain categories. Michal Kosinski and colleagues have shown machine learning algorithms can predict scores on well-established psychometric tests using Facebook “likes” as data input which are the digital equivalent of identity claims: “Likes” tell others about our values, attitudes, interests, and preferences (Kosinski, Stillwell & Graepel,2013). On a separate note, as Duarte et al suggest these tools using natural language processing technology “have limited ability to parse the nuanced meaning of human communication, or to detect the intent or motivation of the speaker.” Definitions of what constitutes toxic or concerning content are often vague and highly subjective. (Duarte, Llanso, and Loup. 2017)

In a world where our digital footprint becomes our twin persona and where almost everyone can get their hands on our information, the democratic process and our ability to openly share your views on different issues also comes under pressure. You might not want to take a stand on important societal issues if you know a future employer may make an adverse decision on your employment because of what they saw. Background checks can also surface details about an applicant’s race, sexual identity, disability, pregnancy, or health status, which employers should not consider during the hiring process. Employers should not sacrifice the integrity of the recruitment process in an effort to catch a handful extreme cases of unacceptable behavior. The benefit does not justify the impact on free speech.

CONCLUSION:

There are certainly great opportunities to use AI to analyze a company’s structure and see potential issues with imbalances across employee population, underrepresentation of different groups across various processes, etc; or use AI in a responsible manner to improve your processes. Algorithmic bias may exist even when there is no discriminatory intent on part of the vendor if there if the data was not good, and no employer invests in a product solely to cut costs if they know there might be certain bias and even discrimination issues. However, blindly onboarding with a software without doing a deep dive due diligence is also not a responsible way of conducting business either. As John Jersin, vice president of LinkedIn Talent Solutions, says the algorithmic hiring service is not a replacement for traditional recruiters. “I certainly would not trust any AI system today to make a hiring decision on its own,” he says, “The technology is just not ready yet.” (Dastin, 2018)

Dipayan Ghosh, a Harvard fellow and former Facebook privacy and public policy official says the use of advanced algorithms and AI in recruiting can create tremendous value for the industry, where discrimination by hiring managers has been rampant, but if implemented irresponsibly, it can have drastic and harmful effects for job candidates. He also says companies reviewing their own code is not enough, especially in the corporate sector, where returns are optimized against near-term revenue, forward investment and stock return, above all else. “We know of too many past cases where all a company needed to do is to self-certify, and it was shown to be perpetuating harms to society and, specifically, certain people. … The public will have little knowledge as to whether or not the firm really is making biased decisions if it’s only the firm itself that has access to its decision-making algorithms to test them for discriminatory outcomes.” (Rosenbaum, 2018)

It is crucial here to underline again algorithms are not independent of their developers, nor is the data of the populations upon which they are built without the potential of embedded bias. All of which could lead to unintended injustice in the hiring process. As Kleinberg et al. suggests it would be naive – even dangerous – to conflate “algorithmic” with “objective…algorithms change the landscape – they do not eliminate the problem” (Kleinberg et al, 2019) or “to think the use of algorithms will necessarily eliminate discrimination against protected groups.” (Barocas et al. 2016)

It is down to humans to proactively question the biases in these systems, put the necessary governance around it and make sure that we are magnifying issues and deepening structural injustices when we are trying to do exactly the opposite.

References:

Barocas, Solon and Selbst, Andrew D., Big Data’s Disparate Impact (2016). 104 California Law Review 671 (2016). https://ssrn.com/abstract=2477899 or http://dx.doi.org/10.2139/ssrn.2477899

Barrett, Lisa Feldman, et al. “Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements.” Psychological Science in the Public Interest, vol. 20, no. 1, July 2019, pp. 1–68, https://doi.org/10.1177/1529100619832930

Bendick, Marc & Nunes, Ana. (2011). Developing the Research Basis for Controlling Bias in Hiring. Journal of Social Issues. 68. 238-262. 10.1111/j.1540-4560.2012.01747.x. https://www.researchgate.net/publication/235556983_Developing_the_Research_Basis_for_Controlling_Bias_in_Hiring

Bogen, Miranda and Rieke, Aaron. Help wanted: An exploration of hiring algorithms, equity, and bias. Technical report, Upturn, 2018. https://www.upturn.org/reports/2018/hiring-algorithms/

Bolukbasi, Tolga et al. “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings.” https://arxiv.org/pdf/1607.06520.pdf

Buolamwini, Joy and Timnit Gebru. “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification.” FAT (2018). http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf

Carpenter, Julia, Google’s algorithm shows prestigious job ads to men, but not to women. Independent. 2015 https://www.independent.co.uk/life-style/gadgets-and-tech/news/googles-algorithm-shows-prestigious-job-ads-to-men-but-not-to-women-10372166.html

Chamorro-Premuzic, T., Winsborough, D., Sherman, R., & Hogan, R. (2016). New Talent Signals: Shiny New Objects or a Brave New World? Industrial and Organizational Psychology, 9(3), 621-640 https://doi.org/10.1017/iop.2016.6

Dastin, Jeffrey, Amazon scraps secret AI recruiting tool that showed bias against women. Reuters. 2018 https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G

Duarte, Natasha, Llanso, Emma and Loup, Anna, Mixed Messages? The Limits of Automated Social Media Content Analysis, Center for Democracy & Technology, November 2017, https://cdt.org/files/2017/11/Mixed-Messages-Paper.pdf

Electronic Privacy Information Center (EPIC). EPIC Files Complaint with FTC about Employment Screening Firm HireVue. 2019. https://epic.org/2019/11/epic-files-complaint-with-ftc.html

Geyik, Sahin Cem, Stuart Ambler, and Krishnaram Kenthapadi. “Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search.” Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2019): https://arxiv.org/abs/1905.01989

Guo, Anhong & Kamar, Ece & Vaughan, Jennifer & Wallach, Hanna & Morris, Meredith. (2019). Toward Fairness in AI for People with Disabilities: A Research Roadmap. https://arxiv.org/abs/1907.02227

Heater, Brian, Facebook settles ACLU job advertisement discrimination suit. TechCrunch. https://techcrunch.com/2019/03/19/facebook-settles-aclu-job-advertisement-discrimination-suit/

Kim, Pauline, Big Data and Artificial Intelligence: New Challenges for Workplace Equality (December 5, 2018). University of Louisville Law Review, Forthcoming. https://ssrn.com/abstract=3296521

Kim, Pauline, Manipulating Opportunity (October 9, 2019). Virginia Law Review, Vol. 106, 2020, Forthcoming.https://ssrn.com/abstract=3466933

Kleinberg & Jens Ludwig & Sendhil Mullainathan & Cass R. Sunstein, 2019. “Discrimination In The Age Of Algorithms,” NBER Working Papers 25548, National Bureau of Economic Research, Inc. https://ideas.repec.org/s/nbr/nberwo.html

Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences of the United States of America, 110(15), 5802–5805. https://www.pnas.org/content/110/15/5802

Lee, Alex. An AI to stop hiring bias could be bad news for disabled people. Wired. 2019. https://www.wired.co.uk/article/ai-hiring-bias-disabled-people
O’Neill, Cathy, The Era of Blind Faith in Bid Data Must End, TedTalk, 2017) https://www.ted.com/talks/cathy_o_neil_the_era_of_blind_faith_in_big_data_must_end?language=en

Raghavan, Manish et al. “Mitigating bias in algorithmic hiring: evaluating claims and practices.” Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (2020): https://arxiv.org/pdf/1906.09208.pdf

Rosenbaum, Eric, Silicon Valley is stumped: A.I. cannot always remove bias from hiring, CNBC. https://www.cnbc.com/2018/05/30/silicon-valley-is-stumped-even-a-i-cannot-remove-bias-from-hiring.html

Schulte, Julius. AI-assisted recruitment is biased. Here’s how to make it more fair. World Economic Forum. 2019 https://www.weforum.org/agenda/2019/05/ai-assisted-recruitment-is-biased-heres-how-to-beat-it/

Quillian, Lincoln, Pager, Devah, Hexel, Ole, Midtbøen, Arnfinn H.. The persistence of racial discrimination in hiring. Proceedings of the National Academy of Sciences Oct 2017, 114 (41) 10870-10875; https://www.pnas.org/content/pnas/114/41/10870.full.pdf)

Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. 2005. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR ’05). Association for Computing Machinery, New York, NY, USA, 154–161. https://dl.acm.org/doi/10.1145/1076034.1076063

Title VII of the Civil Rights Act of 1964: https://www.eeoc.gov/statutes/title-vii-civil-rights-act-1964

Venkatraman, Sankar. This Chart Reveals Where AI Will Impact Recruiting (and What Skills Make Recruiters Irreplaceable), LinkedIn Blog, 2017. https://business.linkedin.com/talent-solutions/blog/future-of-recruiting/2017/this-chart-reveals-where-AI-will-impact-recruiting-and-what-skills-make-recruiters-irreplaceable

Vincent, James, Google’s algorithms advertise higher paying jobs to more men than women. The Verge. 2015 https://www.theverge.com/2015/7/7/8905037/google-ad-discrimination-adfisher

World Bank, Disability Inclusion. 2020. https://www.worldbank.org/en/topic/disability