• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

Defining a Research Testbed for Manned-Unmanned Teaming Research

December 7, 2023

🔬 Research Summary by Dr. James E. McCarthy and Dr. Lillian K.E. Asiala.

Dr. McCarthy is Sonalysts’ Vice President of Instructional Systems and has 30+ years of experience developing adaptive training and performance support systems. 

Dr. Asiala is a cognitive scientist and human factors engineer at Sonalysts Inc., with experience in instructional design, human performance measurement, and cognitive research.

[Original papers (see the papers below) by Dr. James E. McCarthy & Lillian K.E. Asiala, LeeAnn Maryeski, & Nyla Warren]

  1. McCarthy, J.E., Asiala, L.K.E., Maryeski, L., & Warren, N. (2023). Improving the State of the Art for Training Human-AI Teams: Technical Report #1 — Results of Subject-Matter Expert Knowledge Elicitation Survey. arXiv:2309.03211 [cs.HC]
  2. McCarthy, J.E., Asiala, L.K.E., Maryeski, L., & Warren, N. (2023). Improving the State of the Art for Training Human-AI Teams: Technical Report #2 — Results of Researcher Knowledge Elicitation Survey. arXiv:2309.03212 [cs.HC]
  3. Asiala, L.K.E., McCarthy, J.E., & Huang, L. (2023). Improving the State of the Art for Training Human-AI Teams: Technical Report #3 — Analysis of Testbed Alternatives. arXiv:2309.03213 [cs.HC]

Overview: There is growing interest in using AI to increase the speed with which individuals and teams can make decisions and take action to deliver desirable outcomes.  Much of this work focuses on using AI systems as tools.  However, we are exploring whether autonomous agents could eventually work with humans as peers or teammates, not merely tools.  We need a robust synthetic task environment that could serve as a testbed to conduct our research.  The studies summarized here provide a foundation for developing such a testbed.


Introduction

Have you ever found yourself arguing with your GPS?  Have you ever thanked Siri or Alexa?  These examples of personification suggest that it is not far-fetched to believe that intelligent agents may soon be seen as teammates.  We needed to select or build an appropriate testbed to examine fundamental issues associated with “teaming” with AI-based agents.  We established a foundation for this effort by conducting stakeholder surveys and a structured analysis of candidate systems.

Key Insights

1 Step One:  Feature Surveys

Given the focus of our work, we decided to develop separate but overlapping surveys for the researchers and the military subject-matter experts.  The goal of the surveys was to determine what capabilities stakeholders felt should be included in a testbed.  

1.1 Methods

We identified various topics for which stakeholders could provide valuable input and developed a range of open-ended and Likert-style questions from these knowledge elicitation objectives.  Respondents completed the survey electronically.

The analysis of the survey results occurred in two threads.  First, members of the research team conducted independent thematic analyses of the responses for each open-ended question to identify ideas that recurred across answers, even if the respondents used different wording.  After completing the independent analyses, we met and established a consensus list of themes for each question and mapped the individual responses to that consensus list.  Second, in parallel with this qualitative analysis of the open-ended questions, we also conducted a quantitative analysis of the various Likert-like items.  

1.2 Results

Six themes emerged from the analysis:  

  1. System Architecture
  2. Teaming
  3. Task Domain
  4. Data Collection and Analysis
  5. Autonomy
  6. Ease of Use

One theme that strongly emerged when discussing the desired system architecture was the need for operational flexibility within the STE. Respondents wanted to be able to modify the STE over time, and this was expressed in language calling for modularity, open-source development, flexibility, and so forth.  Respondents also suggested that we investigate existing STEs that could be used “as-is” or extended to meet particular needs. The flexibility theme continued when respondents mentioned teaming. Team sizes fell between 6 and 12, and they wanted to be able to assign humans and agents to do a variety of roles flexibly. When respondents mentioned the task domain, they emphasized the need for sufficient levels of complexity, fidelity, and interdependence to ensure that the lab results would transfer to the field. Data collection and analysis was another topic that several respondents addressed. They wanted to ensure that the STE was instrumented to collect a wide range of data points they could use to create specific metrics. The last two themes focused on having some form of autonomy within the STE and creating a game-play environment that is easy to learn, including intuitive displays.

2 Step Two:  Testbed Analysis

Several respondents recommended that our team look into existing human-AI teaming testbeds rather than creating something new.  This was surprising because our initial literature review indicated no “consensus” testbed existed, and each lab developed its own.  Nonetheless, we took the recommendation seriously and systematically investigated the associated landscape. 

2.1 Methods

The research team began its analysis of potential testbeds by defining a three-dimensional taxonomy.  We noted that testbeds could be assessed for the level of interdependency they support, their relevance to the likely application environment, and the sophistication of the agents they could house.  

We then used the results of the surveys to identify, define, and weight eight evaluation factors:

  1. Data Collection & Performance Measurement Factors
  2. Implementation Factors
  3. Teaming Factors
  4. Task Features
  5. Scenario Authoring Factors 
  6. Data Processing Factors 
  7. System Architecture Factors
  8. Agent Factors

In parallel with this process, the team conducted a literature review and identified 19 potential testbeds across three categories:  

  1. Testbeds that were specifically developed to support research on Human-AI teaming.
  2. Testbeds that were built on a foundation provided by commercial games.
  3. Testbeds built on a foundation provided by open-source games.

Using the factor definitions, the research team developed a scoring standard, and then two researchers rated each of the testbeds selected for evaluation. After completing their evaluations, the researchers met to discuss any ratings that exceeded a difference of two points.  These discussions aimed to identify cases where the raters may have applied the evaluation criteria differently. 

2.2 Results

A casual review of the 19 testbeds allowed the research team to eliminate nine candidates without a detailed review. The researchers scored and ranked the remaining ten testbeds. The testbeds with the highest ranks were:

ASIST Saturn+. This testbed was developed within the Artificial Social Intelligence for Successful Teams (ASIST) program.  ASIST aims to study Artificial Social Intelligence (ASI) as an advisor in all-human teaming. The Saturn+ testbed presented users with urban search and rescue scenarios within Microsoft’s Minecraft gaming environment.

ASIST Dragon. ASIST researchers also developed this testbed. It presents bomb disposal scenarios.

Black Horizon. Sonalysts, Inc. developed Black Horizon to allow students to master orbital mechanics fundamentals. In Black Horizon, each learner plays as a member of a fictional peacekeeping organization. Players learn to control an individual satellite and coordinate satellite formations with different sensor, communication, and weapon capabilities.  

BW4T. Researchers from Delft University in the Netherlands and the Florida Institute for Human and Machine Cognition developed Blocks World for Teams (BW4T). In it, teams of two or more humans and agents cooperate to move a particular sequence of blocks from rooms in a maze to a drop zone.

SABRE. The Situational Authorable Behavior Research Environment (SABRE) was created to explore the viability of using commercial game technology to study team behavior. SABRE was developed using Neverwinter Nights™, produced by Bioware. Neverwinter Nights is a role-playing game based on Dungeons and Dragons. The team’s task was to search a virtual city (an urban overlay developed for the Neverwinter Nights game) and locate hidden weapons caches while earning (or losing) goodwill with the non-player characters.  

Between the lines

Considering the results of the survey and analysis, none of the highly rated testbeds was an adequate fit for our needs. The research team opted to deprioritize Black Horizon because it did not provide a proper teamwork environment. SABRE was eliminated because it was not an open-source solution. The two ASIST testbeds were unsuitable because their architectures did not support synthetic teammates. BW4T was removed because it presented security challenges.

Instead, we used the lessons gathered during the review to develop a Concept of Operations for a novel testbed. The envisioned testbed would present team members with a time-sensitive search-and-recovery task in outer space. We are assessing whether we can affordably develop the testbed and release it as an open-source environment.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

AI Policy Corner: Frontier AI Safety Commitments, AI Seoul Summit 2024

AI Policy Corner: The Colorado State Deepfakes Act

Special Edition: Honouring the Legacy of Abhishek Gupta (1992–2024)

AI Policy Corner: The Turkish Artificial Intelligence Law Proposal

From Funding Crisis to AI Misuse: Critical Digital Rights Challenges from RightsCon 2025

related posts

  • Measuring Surprise in the Wild

    Measuring Surprise in the Wild

  • Research summary: Challenges in Supporting Exploratory Search through Voice Assistants

    Research summary: Challenges in Supporting Exploratory Search through Voice Assistants

  • Why We Need to Audit Government AI

    Why We Need to Audit Government AI

  • Technology on the Margins: AI and Global Migration Management From a Human Rights Perspective (Resea...

    Technology on the Margins: AI and Global Migration Management From a Human Rights Perspective (Resea...

  • Research summary: Health Care, Capabilities, and AI Assistive Technologies

    Research summary: Health Care, Capabilities, and AI Assistive Technologies

  • Open-source provisions for large models in the AI Act

    Open-source provisions for large models in the AI Act

  • Conversational AI Systems for Social Good: Opportunities and Challenges

    Conversational AI Systems for Social Good: Opportunities and Challenges

  • Social media polarization reflects shifting political alliances in Pakistan

    Social media polarization reflects shifting political alliances in Pakistan

  • Disability, Bias, and AI (Research Summary)

    Disability, Bias, and AI (Research Summary)

  • Discursive framing and organizational venues: mechanisms of artificial intelligence policy adoption

    Discursive framing and organizational venues: mechanisms of artificial intelligence policy adoption

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.