🔬 Research Summary by Pengyuan Zhou, working on Trustworthy AI and Metaverse, he got his PhD from University of Helsinki, Finland in 2020 and currently working as a research associate professor at University of Science and Technology of China (USTC).
[Original paper by Pengyuan Zhou, Benjamin Finley, Lik-Hang Lee, Yong Liao, Haiyong Xie, Pan Hui]
Overview: AI plays a key role in current cyberspace and will drive future immersive ecosystems. Thus, the trustworthiness of such AI systems (TAI) is vital as failures can hurt adoption and cause user harm especially in user-centered ecosystems such as the metaverse. This paper gives an overview of the historical path of existing TAI approaches and proposes a research agenda towards systematic yet user-centered TAI in immersive ecosystems.
Trustworthy AI (TAI) has recently seen significant attention from government entities and technology giants. The major goal of TAI is to ensure the protection of people’s fundamental rights while still allowing responsible competitiveness of businesses . The term TAI has been around for years and was boosted by the well-known EU guidelines on TAI published in 2019 . TAI metrics are naturally a major issue and critical to accurately measuring the degree of system trustworthiness and the amount of protection offered by AI-enabled technologies. The appropriate TAI metrics can vary in different domains due to various scenarios, user demand, processed data, adversaries, regulations, and laws.
Despite the large number of case-by-case metrics used in current literature, a comprehensive and systematic outline of TAI focusing on metric selection has yet to be proposed, resulting in challenges for metric choices for non-experts and even professionals. Furthermore, future immersive ecosystems, such as the metaverse, are going to incorporate more complex systems that blend the virtual and physical worlds, and, more importantly, complicated definitions of system performances and user experiences.
This paper serves as a first effort to outline the critical TAI metrics in current domains, capture the general logic of metric selection, and call for advancing user-centered metrics for immersive ecosystems and autonomous metric selections.
We focus on fairness, privacy, robustness, and the key metrics to meet these requirements in different domains.
Fairness. AI, especially machine learning, often presents statistical discrimination due to non-identical data distribution or resources, which leaves certain privileged groups with performance advantages and others with disadvantages. The learning bias, regardless if generated on purpose or accidentally, exacerbates existing resource inequity and further harms fairness in society .
Privacy. Privacy is fundamental yet hard to define explicitly. Nissenbaum  defines privacy in terms of contextual integrity and contextual information norms dictating how information may be used or shared. As agreed by most researchers, privacy is a multi-dimensional concept  and thus is normally assessed via multiple metrics focusing on the exposure of private information.
Robustness. Due to natural variability or dynamic system conditions over time, predicting how future conditions will change might be hard or impossible. The term deep uncertainty, in this scenario, is more proper to describe the issue in AI. Under deep uncertainty, typically, multiple environmental state factors, i.e., future conditions, jointly affect the decisions (e.g., policies designs and plans) , resulting in influences on the considered performance metric (e.g., cost, utility and reliability). Robustness metrics function as a transformation of the performance metrics under these future conditions.
General Rule for Metric Selection. In general, the system administrator can follow a series of steps to select the proper metrics: (1) Which requirements of trustworthiness should be assessed? (2) Who cares about the issue the most, e.g., the system administrator, the users, the regulators, society etc.? (3) Regarding each requirement, who are the major concerned entities for each party (because different parties may have different concerns, e.g., system admins care about performance while users care about privacy), e.g., consistent performance, protected data, equal performance? (4) What is the targeting or common adversary? (5) What are the available data resources to compute the selected metrics? (6) What is the difficulty and cost of the metric assessment? (7) Will the metrics stay valid over time?
TAI metrics in the existing cyberspace
By examining the TAI metric selection across computing and networking domains, we can see that the definition and selection of TAI metrics for computing are more straightforward than for networking. A significant reason is that the outputs of many computing systems like recommendation systems and search engines are more oriented towards end-users, e.g., the results of ranking algorithms well match the needs of SER which uses user data and show results directly to users.
In contrast, the learning algorithms in the networking context usually result in intermediate metrics that serve to adapt protocols or algorithms that eventually affect the target metrics. In other words, the output of the learning algorithms in computing can often be directly used to assess user experience, while in networking there is normally an intermediate model to transfer networking performance (controlled by AI) to user experience. Thus, TAI metrics in networking require more ad-hoc designs, definitions of usage and user context, and targets of networking systems.
Lessons learned and research agenda
Currently, most computing and networking systems use “functionality-driven design”, which we use as a contrast to “user-centered design”, in the sense that the former focuses more on pre-defined systematic performance metrics though also sometimes considers user-related metrics, such as QoS and QoE. Relatedly, TAI design and metric selection in such systems also often focus on the most functionalities during specific life-cycle phases. The current mindset of TAI design and metric selection, restricted by the aforementioned design philosophies, takes into consideration only part of human cognition, specifically the conscious and concrete areas that can be more easily measured and quantified, such as pattern recognition, language, attention, perception, and action. These are widely explored by AI communities. However, the exploration of the unconscious and abstract areas of cognition, e.g., mental health and emotion, is just beginning. Methodological limits is a key reason for this, e.g., lack of devices and theories to accurately capture bioelectrical signals and convert these signals to emotional statuses. Trustworthiness itself consists of cognitive, emotional and behavioral factors, since trustworthiness is a user-oriented term. This aspect will play an increasingly important role when “user-centered design” dominates future cyberspace, replacing the current “functionality-driven design”. In the future, considering the unconscious and abstract cognition areas will be vital to guarantee TAI. These parts are hard to quantify and might remain so even with advanced techniques in sensor-enabled immersive cyberspace. Therefore, other assessment methods may be required for TAI in such immersive cyberspaces. This is discussed in the remaining paragraphs.
TAI metric selection for immersive cyberspace.
The “user-centered” features of the metaverse may bring important changes to TAI and TAI metric selection. In current cyberspace, AI-enabled applications and human users interact but are still significantly and explicitly separated. Hence the measure of TAI mainly focuses on system performance and technical metrics. In the metaverse, however, user-representative avatars, cognitive emotional-interactive products, and other similar humanoids will play vital roles in improving a user’s feeling of involvement to seamlessly experience the blended virtual-physical world. More importantly, the aforementioned avatars and humanoids will collaborate with human users. Thus, user-centered TAI metrics, such as those focusing on cognition, sentiment, and psychology, with sensor-enabled monitoring in the metaverse, will become a new driver for understanding robustness, privacy and fairness.
Emerging techniques may have the potential to tackle the challenge through understanding the internal states of users. For example, the state-of-the-art Brain-Computer Interfaces (BCI) estimate the user’s current emotion, attention, or fatigue level to some extent by monitoring the bioelectrical signals that reflect brain activity. These signals can be recorded by a device like an electroencephalogram . These emerging techniques may allow for quantitatively measuring the abstract metrics that currently rely on limited-scale qualitative experiments (often based on user interviews). Nevertheless, these techniques are still immature. Moreover, the practicality and applicability of the techniques may be limited, as they normally require the users to wear additional devices, which are often inconvenient. Nowadays, immersive headsets (AR/VR) have similar issues . Therefore, requirements for qualitative measures like user studies for TAI assessment, e.g., eliciting user requirements for trust-guarantee AI services, might be reasonable. However, the current approaches for understanding users, to a large extent, are costly and time-consuming. Thus, this timeliness issue is a current challenge to be solved.
Between the lines
This survey discussed metrics for TAI and further examined the aspects of fairness, privacy and robustness in the computing and networking domains. The existing metrics are mainly driven by system functionalities and efficacy with less emphasis on user-centered factors. Meanwhile, the ad-hoc metric selection causes suboptimal results in building trustworthiness with users. We revisited the TAI domain to lay out a research agenda that will assist researchers working on TAI and immersive cyberspace to contextualize and focus their efforts. We note that AI will become an indispensable driver of immersive cyberspace and that users will interact with AI-enabled services in this virtual-physical blended world under the premise that user trust is essential to the wide adoption of such services. Therefore, we call for a user-centered paradigm of building trustworthiness beyond sole system measurements and considering cognitive and affective factors.
 European Union. 2012. EU Charter of Fundamental Rights. (2012). https://ec.europa.eu/info/aid-development-cooperation-fundamental-rights/your-rights-eu/eu-charter-fundamental-rights_en
 HLEGAI. 2019. Ethics guidelines for trustworthy AI. B-1049 Brussels (2019).
 E. L. Paluck, S. A. Green, and D. P. Green. 2019. The contact hypothesis re-evaluated. Behavioural Public Policy 3, 2 (2019), 129–158.
 H. Nissenbaum. 2004. Privacy as contextual integrity. Wash. L. Rev. 79 (2004), 119.
 R. S. Laufer. 1973. Some analytic dimensions of privacy. In Architectural psychology, Proc. of the Lund Conf., Lund: Studentlitteratur.
 Ben-Tal, L. El Ghaoui, and A. Nemirovski. 2009. Robust optimization. Princeton Uni. press.
 K. A. Shatilov, D. Chatzopoulos, L.-H. Lee, and P. Hui. 2021. Emerging ExG-Based NUI Inputs in Extended Realities: A Bottom-Up Survey. ACM Trans. Interact. Intell. Syst. 11, 2, Article 10 (jul 2021), 49 pages.
 L.-H. Lee, T. Braud, S. Hosio, and P. Hui. 2022. Towards Augmented Reality Driven Human-City Interaction: Current Research on Mobile Headsets and Future Challenges. ACM CSUR 54 (2022), 1 – 38