Responsible AI In Healthcare

🔬 Research Summary by Federico Cabitza, Davide Ciucci, Gabriella Pasi, and Marco Viviani

Federico Cabitza is an Associate Professor at the University of Milano-Bicocca, Department of Informatics, Systems, and Communication (DISCo). His research interests are the design and evaluation of Artificial Intelligence systems to support decision-making, especially in health care and law, and the impact of these technologies on the organizations that adopt them and user experience and work processes.

Davide Ciucci is an Associate Professor at the University of Milano-Bicocca, Department of Informatics, Systems, and Communication (DISCo). His research interests include Uncertainty Management in Knowledge Representation and Machine Learning.

Gabriella Pasi is a Full Professor at the University of Milano-Bicocca, Department of Informatics, Systems, and Communication (DISCo). She currently holds the role of Head of Department and she is also the Pro-rector for International Relations. Her main research activities include Artificial Intelligence, Natural Language Processing, Information Retrieval, Knowledge Representation, and Fuzzy Logic.

Marco Viviani is an Associate Professor at the University of Milano-Bicocca, Department of Informatics, Systems, and Communication (DISCo). His main research activities include Social Computing, Natural Language Processing, Trust and Reputation Management, Graph Mining, and User Modeling.

[Original paper by Federico Cabitza, Davide Ciucci, Gabriella Pasi, Marco Viviani]

Overview: This article discusses open problems, implemented solutions, and future research in responsible AI in healthcare. In particular, we illustrate two main research themes related to the work of two laboratories within the Department of Informatics, Systems, and Communication (DISCo) at the University of Milano-Bicocca. The problems addressed, in particular, concern uncertainty in medical data and machine advice and the problem of online health information disorder.

Introduction

According to the Ethics Guidelines for Trustworthy Artificial Intelligence, a document defined by the High-Level Expert Group on Artificial Intelligence (AI HLEG) set up by the European Commission, seven are the key requirements that AI systems should meet to be trustworthy: 1. human agency and oversight, 2. technical robustness and safety, 3. privacy and data governance, 4. transparency, 5. diversity, non-discrimination, and fairness, 6. societal and environmental well-being, and 7. accountability. Two different research laboratories at DISCo are currently addressing some of the key requirements illustrated before in the health domain. In particular, Federico Cabitza and Davide Ciucci, members of the Modeling Uncertainty, Decisions and Interaction Laboratory (MUDILAB), address the problem of uncertainty in the data-feeding machine learning algorithms and focus on the importance of the cooperation between AI and human decision-makers in healthcare. All seven key requirements are involved, with particular reference to 1, 4, 6, and 7. Gabriella Pasi and Marco Viviani, members of the Information and Knowledge Representation, Retrieval and Reasoning Laboratory (IKR3 LAB), address the problem of health information disorder and discuss several open issues and research directions related mainly to key requirements 1, 2, 4, 5, and 6.

Key Insights

Responsible AI as a Support for Healthcare Decisions

Medical data can be affected by different types of uncertainty/variability, some of which are not usually accounted for when developing ML models. In particular, we refer to different forms of variability: biological variability, which occurs when a person is associated with more or less slightly different values that express a health condition over time; analytical variability, which occurs when testing equipment, although calibrated, produces different values for a specific patient/subject concerning other equipment (from the same vendor or different vendors); pre- and post-analytical variability, which occur when different values in the same exam for the same subject can be due to different ways (including erroneous ones) to use the equipment or produce data about test results.

These sources of variability add to the noise due to more common (and treated) sources of data or label noise: missing data in different forms, e.g., a value that is not known or a patient that does not reveal a symptom; vagueness, such as a symptom that is reported as mild rather than as severe; a physician undecided on the interpretation of an exam, but associated with a subjective degree of confidence; noise, in instruments or in reporting data; clerical errors.

In light of these considerations, it is essential to get an awareness of potential sources of uncertainty in biological and clinical data and conceive novel methods to mitigate their impact and manage the related variability and uncertainty. We are exploring approaches based on partially labeled data, superset learning, multi-rater annotation, cautious learning, and soft clustering. The goal is to create a framework for the robustness validation of classification systems based on Machine Learning. The developed tools and algorithms should be able to handle different forms of uncertainty simultaneously and abstain from giving a precise answer whenever this is not possible or too risky. Moreover, we advocate the need to move beyond aggregation methods by mere majority voting in ground truthing, that is, the production of the ground truth labels to be used in supervised learning, as this could result in excessive simplification for the complexity of the phenomenon at hand, for which multiple right and complementary interpretations are possible to coexist for a single case. Finally, we also advocate further research on the design and evaluation of alternative interaction protocols stipulating how human decision makers could use, and in some cases even collaborate, with AI-based decision support systems, to mitigate the risk of having cognitive biases, like automation bias, automation complacency, AI over-reliance, and its opposite, the prejudice against the machine, which undermine the effectiveness and efficiency of the computer-supported decision-making process. This will lead to more reliable and trustworthy decision support systems.

Responsible AI and Health Information Disorder

All the forms of false, unreliable, low-quality information, generated with or without fraudulent intent, have recently been grouped under information disorder. The spread of this information disorder is partly due to and exacerbated by several AI systems that have allowed, in recent years, the generation of increasingly realistic fraudulent content and the large-scale dissemination of the same, often with manipulative intent. As far as the first aspect is concerned, let us think, for example, about the phenomenon of deep fakes; as far as the second aspect is concerned, we may cite the increasing effectiveness of the systems of micro-targeting, information filtering, and social bot generation.

The ethical implications of information disorder generation essentially concern human dignity, autonomy, democracy, and personal/social well-being. Human dignity, because people are often treated as aggregates of data that are timely processed for business purposes, often with opinion manipulation intents; in this sense, people have the impression that they are receiving the same information as any other person in the digital ecosystem when in fact they are micro-targeted. This problem is closely related to autonomy, e.g., users cannot build their own (digital) identity. Also, manipulation leading to excessive polarization (as seen above) produces the impossibility of making globally shared decisions. It is clear how all these manipulation-related aspects can lead to serious repercussions for individual or social well-being, including health.

Therefore, in developing solutions for identifying information disorder, we are currently investigating models and methodologies (especially in the health domain) that allow users not to have a hard filter for access to information (based on their genuineness estimated by the system). We are evaluating the possibility of providing users with a ranking of the information that considers a gradual notion of genuineness instead of a binary notion, as done so far in the literature. Furthermore, to tackle issues related to data collection and data processing and the presence of opacity in algorithms (inscrutable evidence and misguided evidence), we have recently worked on the definition of suitably labeled datasets and evaluation strategies within the CLEF eHealth Evaluation Lab, which focuses on the Consumer Health Search (CHS) task. In addition, we have been working on the development of model-driven solutions for the evaluation of the genuineness of information (including health information), which are based on the use of Multi-Criteria Decision Making (MCDM) techniques where the system proves to be explainable concerning the results also obtained by considering external knowledge and (medical) experts.

Between the lines

In a situation of increasing attention to and reliance on AI tools for healthcare decision-making, the research undertaken at DISCo aims to frame the usefulness and limitations of these tools. The proposed decision support systems and their components will improve the reliability and robustness of decisions made by physicians and healthcare decision-makers in general.

With respect to health information disorder, a key role will be played by the study and development of solutions that are as transparent and understandable as possible for users, along with safeguarding the confidentiality, especially in such a sensitive domain as health. From a technological perspective, this will be possible by developing hybrid models that are partly model-driven and partly data-driven.