Mini summary (scroll down for full summary):
When studying for biases in NLP models, there is not enough of a focus on the impacts that phrases related to disabilities has on the more popular models and how it skews and biases downstream tasks especially when using popular models like BERT and using tools like Jigsaw to do toxicity analysis of phrases. This paper presents an analysis of how toxicity changes based on the use of recommended vs. non-recommended phrases when talking about disabilities and how results are impacted when using them in downstream contexts such as when writers are nudged to use certain phraseology that moves them away from expressing themselves fully reducing their dignity and autonomy. It also looks at the impacts that this has in online content moderation whereby there is a disproportionate impact on the communities because of the heavy bias in censoring content that has these phrases even when they might be used in constructive contexts such as communities discussing the conditions and engaging with other hate speech to debunk myths. Given that more and more content moderation is being turned over to automated tools, this has the potential to suppress the representation of people with disabilities in online fora where they discuss using such phrases thus also skewing the social attitudes and perception of the prevalence of these conditions as being less prevalent than they actually are. The authors point to a World Bank study that mentions that approximately 1 billion people around the world have some form of disability.
They also look at the biases that are captured in the BERT model where there is a negative association between the recommended phrases for disability and associations with things like homelessness, gun violence, and other socially negative terms leads to a slant that impacts and shapes the representations of these words that are captured in the models. Since such models are used widely in many downstream tasks, the impacts are amplified and present themselves in unexpected ways. The authors finally make some recommendations on how to counter some of these problems by involving communities more directly and learning how to be more representative and inclusive. Making disclosures about the places where the models are appropriate to use, where they shouldn’t be used, and the underlying datasets that were used to train the system can also help people make more informed decisions about when to use and when not to use these systems so that they don’t perpetuate harm on their users.
Underrepresentation of disabilities in datasets and how they are processed in NLP tasks is an important area of discussion that is often not studied empirically in the literature that primarily focuses on other demographic groups. There are many consequences of this, especially as it relates to how text related to disabilities is classified and has impacts on how people read, write, and seek information about this.
Research from the World Bank indicates that about 1 billion people have disabilities of some kind and often these are associated with strong negative social connotations. Utilizing 56 linguistic expressions as they are used in relation to disabilities and classifying them into recommended and non-recommended uses (following the guidelines from Anti-Defamation League, ACM SIGACCESS, and ADA National Network), the authors seek to study how automated systems classify phrases that indicate disability and whether usages split by recommended vs. non-recommended uses make a difference in how these snippets of text are perceived.
To quantify the biases in the text classification models, the study uses the method of perturbation. It starts by collecting instances of sentences that have naturally occurring pronouns he and she. Then, they replace them with the phrases indicating disabilities as identified in the previous paragraph and compare the change in the classification scores in the original and perturbed sentences. The difference indicates how much of an impact the use of a disability phrase has on the classification process.
Using the Jigsaw tool that gives the toxicity score for sentences, they test these original and perturbed sentences and observe that the change in toxicity is lower for recommended phrases vs. the non-recommended ones. But, when disaggregated by categories, they find that some of them elicit a stronger response than others. Given that the primary use of such a model might in the case of online content moderation (especially given that we now have more automated monitoring happening as human staff has been thinning out because of pandemic related closures), there is a high rate of false positives where it can suppress content that is non-toxic and is merely discussing disability or replying to other hate speech that talks about disability.
To look at sentiment scores for disability related phrases, the study looks at the popular BERT model and adopts a template-based fill-in-the-blank analysis. Given a query sentence with a missing word, BERT produces a ranked list of words that can fill the blank. Using a simple template perturbed with recommended disability phrases, the study then looks at how the predictions from the BERT model change when disability phrases are used in the sentence. What is observed is that a large percentage of the words that are predicted by the model have negative sentiment scores associated with them. Since BERT is used quite widely in many different NLP tasks, such negative sentiment scores can have potentially hidden and unwanted effects on many downstream tasks.
Such models are trained on large corpora, which are analyzed to build “meaning” representations for words based on co-occurrence metrics, drawing from the idea that “you shall know a word by the company it keeps”. The study used the Jigsaw Unintended Bias in Toxicity Classification challenge dataset which had a mention of a lot of disability phrases. After balancing for different categories and analyzing toxic and non-toxic categories, the authors manually inspected the top 100 terms in each category and found that there were 5 key types: condition, infrastructure, social, linguistic, and treatment. In analyzing the strength of association, the authors found that condition phrases had the strongest association, and was then followed by social phrases that had the next highest strongest association. This included topics like homelessness, drug abuse, and gun violence all of which have negative valences. Because these terms are used when discussing disability, it leads to a negative shaping of the way disability phrases are shaped and represented in the NLP tasks.
The authors make recommendations for those working on NLP tasks to think about the socio-technical considerations when deploying such systems and to consider the intended, unintended, voluntary, and involuntary impacts on people both directly and indirectly while accounting for long-term impacts and feedback loops.
Such indiscriminate censoring of content that has disability phrases in them leads to an underrepresentation of people with disabilities in these corpora since they are the ones who tend to use these phrases most often. Additionally, it also negatively impacts the people who might search for such content and be led to believe that the prevalence of some of these issues are smaller than they actually are because of this censorship. It also has impacts on reducing the autonomy and dignity of these people which in turn has a larger implication of how social attitudes are shaped.
Original piece by Hutchinson et al.: https://arxiv.org/abs/2005.00813