REAL ML: Recognizing, Exploring, and Articulating Limitations of Machine Learning Research

🔬 Research summary by Jessie J. Smith, a 4th year PhD candidate in the Information Science department at the University of Colorado Boulder, researching best practices for operationalizing ML fairness in industry settings.

[Original paper by Jessie J. Smith, Saleema Amershi, Solon Barocas, Hanna Wallach, and Jennifer Wortman Vaughan]

Overview: Limitations are inherent in conducting research; transparency about these limitations can improve scientific rigor, help ensure appropriate interpretation of research findings, and make research claims more credible. However, the machine learning (ML) research community lacks well-developed norms around disclosing and discussing limitations. This paper introduces REAL ML, a set of guided activities to help ML researchers recognize, explore, and articulate the limitations of their research—co-designed with 30 researchers from the ML community.

Introduction

All research has limitations—drawbacks in the design or execution of research that may impact the resulting findings and claims. No research is perfectly generalizable, which is why limitations are an inherent part of conducting research; limitations situate the research methods and findings within the specific context of the research study. Many scientific fields have well-established norms around disclosing and discussing limitations, often recognized as necessary for improving scientific rigor and research integrity. However, the ML research community does not have well-developed norms around disclosing and discussing limitations.

This paper addresses this gap by conducting an iterative design process with 30 ML and ML-adjacent researchers to develop the REAL ML tool (Recognizing, Exploring, and Articulating Limitations in Machine Learning tools). From our interviews, we identified ML researchers’ perceptions of limitations, their practical and cultural challenges surrounding transparency about limitations, and how REAL ML could help alleviate these challenges. In this work, we also introduce two lists as part of REAL ML: (1) “sources of limitations”; and (2) “types of limitations” that commonly occur in ML research. Early evidence shows that REAL ML is useful for ML researchers hoping to recognize, explore and articulate the limitations of their work.

Key Insights

What is a limitation?

Our interviews revealed that our participants disagreed with a single definition of “limitation” in the context of ML research. Some participants described limitations as an inherent part of the research, while others thought that limitations were indicators of poor research methods. These participants who thought of limitations as “weaknesses” or “flaws” were less enthusiastic about highlighting them in their research papers, which indicates that this perspective of limitations could limit the scientific rigor of the research discipline. Throughout our co-design process, we identified 7 “Types” of Limitations that are common in ML research: (1) Fidelity; (2) Generalizability; (3) Robustness; (4) Reproducibility; (5) Resource Requirements; (6) Value Tensions; and (7) Vulnerability to Mistakes and Misuse. These limitation types are described in detail with examples in the full paper.

What challenges do ML researchers face when recognizing, exploring, and articulating the limitations of their work?

One finding from our study showed that junior researchers experienced more difficulty recognizing the limitations of their research when compared to senior researchers. Some participants noted that the lack of transparency about common limitations within the ML research discipline could cause junior researchers to take years to gain the disciplinary knowledge necessary to fully recognize the limitations of their research.

Once limitations are identified in the work, it is still challenging for researchers to explore and articulate them to their research audience. Most of our participants shared that they were afraid if they disclosed limitations in their papers, their work would be more likely to get rejected from publication venues. Some participants noted a perceived stigma around disclosing limitations in the ML research community, so disclosing less aligns better with the community norms.

Even if researchers knew which limitations to disclose and were motivated to disclose them, several participants noted that page limits on publications require them to constrain their discussion of limitations or omit them entirely. These participants indicated they need guidance about prioritizing which limitations are most valuable to share with the research community if they do not have space to share all of them.

What is the REAL ML tool, and how does it help ML researchers?

Participants raised many challenges due to a lack of guidance and appropriate training on recognizing, exploring, and articulating limitations. These were the challenges we sought to alleviate while co-designing REAL ML.

REAL ML consists of an introduction plus four content sections: 1) sources and types of limitations, 2) recognizing limitations, 3) exploring limitations, and 4) articulating limitations. Each section includes guided activities and resources for ML researchers to use when writing limitations sections. Below is a brief description of each content section within REAL ML.

Sources and Types of Limitations: The first section asks ML researchers to familiarize themselves with the sources and types of limitations commonly occurring in ML research and start thinking about how these might relate to their research. “Sources of Limitations” are broken into three broad categories: unavoidable constraints (e.g., constraints on time or resources), unforeseen challenges (e.g., experimental failures or negative results), and implicit and explicit decisions made during the ML research process. This section also includes a list of “Types of Limitations” commonly occurring in ML research to reflect on and return to later.

Recognizing Limitations: This section prompts ML researchers to build on the activity in the first section by filling in a worksheet with a list of the limitations of their research, along with their sources and types.

Exploring Limitations: This section asks ML researchers to answer a series of questions designed to help them explore the limitations they recognized in the previous section, with guidance to uncover information that may be important to articulate to their research audience. They are asked to record their answers on the worksheet.

Articulating Limitations: In the final section, ML researchers are asked to build on the information they recorded in the worksheet to draft a limitations section. The goal is to develop a narrative around the limitations of their research that is valuable to different audiences.

The full tool (the tool pdf and its associated worksheet) is available at https://github.com/jesmith14/REAL-ML.

Between the lines

Although REAL ML is a great starting point to help encourage disclosure of limitations in the ML research community, some challenges we uncovered around transparency of limitations stem from community norms and cannot be addressed with tooling. One example was the emphasis on generalizability in the ML research review process. Valuing generalizability as a research goal could disincentivize researchers, to be honest about the context and scope of their research findings and could be limiting the scientific rigor of the research discipline. This work also highlighted how peer review processes within the ML research discipline can make disclosure of limitations difficult partly because there is a fear that transparency about limitations could lead to paper rejection.

In other disciplines, disclosing limitations is a necessary part of research. There is a shared belief outside of ML research that recognizing, exploring, and articulating limitations can make research easier to reproduce, help ensure appropriate interpretation of research findings, make research claims more credible, and highlight issues that would benefit from further research. As a result, disclosing limitations ought to be valued (rather than feared) in the ML research discipline for its ability to improve scientific rigor and the research integrity of the research community.

We hope this work supports ML researchers to recognize, explore, and articulate limitations. We encourage future work to continue cultivating more well-established community norms for disclosing limitations in ML research.