• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
    • Tech Futures
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Automated Interviewer or Augmented Survey? Collecting Social Data with Large Language Models

February 1, 2024

🔬 Research Summary by Alejandro Cuevas Villalba, Ph.D. student in Computer Science at Carnegie Mellon University, focusing on measuring social influence and improving reputation systems.

[Original paper by Alejandro Cuevas Villalba, Eva M. Brown, Jennifer V. Scurrell, Jason Entenmann, and Madeleine I. G. Daepp]


Overview: Quantitative data collection methods (e.g., surveys) often stand at odds with qualitative methods (e.g., interviews). Tools such as surveys enable researchers to collect and analyze data at scale but can constrain the depth and breadth of participants’ answers. On the other hand, tools such as interviews facilitate rich and nuanced data collection, though at the expense of scale. Undoubtedly, the advent of large language models (LLMs) offers unique opportunities to develop new data collection methods. However, should we think of LLMs as survey enhancers? Or should we think of them as automated interviewers?


Introduction

Advances in technology have often ushered in new eras of data collection methodologies. In the same way that the Internet has enabled the prevalence of web surveys, the adoption of telephones across households has replaced mail-based surveys. Today, the era of personal AI assistants is set to usher in a new wave of data collection tools.

While conversational agents (e.g., chatbots) have been around for several years, we may be finally reaching—and quickly surpassing—the point where these tools are actually a pleasure to interact with. Companies like OpenAI have managed to create remarkable user experiences and greatly reduced the burden of developing and deploying conversational agents.

Thus, a natural question emerged in the social sciences: how can we use AI assistants to facilitate data collection? We explored this question by designing and deploying a conversational agent to conduct a study with over 399 participants. Our findings suggest that there are numerous benefits to employing conversational agents over traditional surveys. Interestingly, these tools still fall short when compared to in-person interviews. What was most surprising, however, is that managing participants’ expectations is a key methodology element.

Key Insights

The limitations of surveys, interviews, and chatbots

Surveys have become a popular tool for data collection, but their prevalence is not due to their ability to extract superior insights. Rather, the ease of data analysis after collection is what turned them into the de facto tools of human data collection. However, surveys are not great exploratory tools. Composing questions, especially close-ended ones, is tough when we don’t yet know what to ask. This is where methods like interviews or focus groups excel because they allow interviewers to probe participants (i.e., ask subsequent questions) on areas they think are interesting. These methods are flexible yet substantially hard to scale. With great effort, an interviewer can talk to 8 participants a day. But the brunt of the work is in the analysis, where each hour of interview may amount to more than 3 hours of analysis.

The gap between both methods is—and has always been—quite broad. For a long time, researchers have sought methods that offer more flexibility, either by allowing us to scale interviews or by enhancing the richness of surveys. For quite some time, chatbots were seen as an option to bridge the gap. Nonetheless, the promise of chatbots was received with great disappointment. Whether it was the Q&A system for an airline or a robot receptionist answering the support line of a bank, most of us have experienced the distaste of interacting with these alleged conversational agents. Despite the significant advances in machine learning, specifically neural networks, a pleasant interaction with a chatbot seemed elusive.

Then came the LLMs with skyrocketing popularity thanks to OpenAI’s ChatGPT. This easy-to-use, know-it-all conversational agent captivated the world and reinstated confidence in chatbots. Not only are they easy to use, but recently, they have become even easier to deploy. With the new functionalities announced by OpenAI, users can now set up their custom chatbots with few lines of code. Yes, this means you can deploy a chatbot that talks like you or about niche things that you care about; the sky (or compute power) is the limit.

And yes, this also means we can use chatbots to create surveys and interview questions or have them conduct surveys or conduct interviews. But can it really do these tasks well? This is what we set forth to study. To do this, we designed three chatbots and recruited 399 participants to participate in a study about AI alignment. We split participants into three groups, each interacting with a different chatbot, and asked them to complete a survey about their experience.

Our study approach

Our study had three stages. First, participants took a multiple-choice survey on AI alignment. The purpose of the survey was twofold. First, it was a way to prime participants about the topic at hand. Second, it provided us with a benchmark by which we could assess our interpretations of the conversations. More on this later. After the first survey, participants were split randomly into three groups, each with a different chatbot design. The chatbots were programmed to ask questions about AI alignment. Our baseline was a chatbot that only asked hardcoded questions, whereas the two other chatbots relied on LLMs to display more intelligence. Lastly, participants were asked to complete an exit survey on their experience.

We found significant evidence to suggest that researchers have much to gain from employing chatbots to replace surveys. Interestingly, this was not the case when comparing chatbots to interviews. Compared to surveys, participants had better engagement and rated their experience significantly higher. On the other hand, when comparing the chatbot to in-person interviews, participants found themselves preferring a human-to-human interview.

Among the most interesting key insights was an accidental discovery in our methodology. In the survey that followed the chatbot interaction, we referred to the chatbot as an “AI interviewer.” This framing was particularly salient to participants in the baseline group, who interacted with the baseline chatbot. Several of the participants expressed frustration and disappointment at the fact that the chatbot did not seem intelligent. On the other hand, this effect was absent from the other groups. Rather, participants expressed that they much preferred their interaction with the chatbot over traditional surveys.

Between the lines

The ease of deployment and enjoyment of participants bodes well for using chatbots as data collection instruments in user studies. Soon, however, we should consider them as survey augmenters rather than replacements for in-person interviews. Furthermore, the missing puzzle piece from our work is scaling the analyses of the collected data. Recent work has shown that LLMs may also assist in analyzing the chatlogs. Although outside the scope of this paper, we found encouraging preliminary results when analyzing the collected data with ChatGPT.

With careful management of user expectations, we could introduce a new tool for user studies: a tool that allows us to explore new phenomena at a greater scale more quickly and deeply.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

Close-up of a cat sleeping on a computer keyboard

Tech Futures: The threat of AI-generated code to the world’s digital infrastructure

The undying sun hangs in the sky, as people gather around signal towers, working through their digital devices.

Dreams and Realities in Modi’s AI Impact Summit

Illustration of a coral reef ecosystem

Tech Futures: Diversity of Thought and Experience: The UN’s Scientific Panel on AI

This image shows a large white, traditional, old building. The top half of the building represents the humanities (which is symbolised by the embedded text from classic literature which is faintly shown ontop the building). The bottom section of the building is embossed with mathematical formulas to represent the sciences. The middle layer of the image is heavily pixelated. On the steps at the front of the building there is a group of scholars, wearing formal suits and tie attire, who are standing around at the enternace talking and some of them are sitting on the steps. There are two stone, statute-like hands that are stretching the building apart from the left side. In the forefront of the image, there are 8 students - which can only be seen from the back. Their graduation gowns have bright blue hoods and they all look as though they are walking towards the old building which is in the background at a distance. There are a mix of students in the foreground.

Tech Futures: Co-opting Research and Education

Agentic AI systems and algorithmic accountability: a new era of e-commerce

related posts

  • Editing Personality for LLMs

    Editing Personality for LLMs

  • FairQueue: Rethinking Prompt Learning for Fair Text-to-Image Generation (NeurIPS 2024)

    FairQueue: Rethinking Prompt Learning for Fair Text-to-Image Generation (NeurIPS 2024)

  • Transferring Fairness under Distribution Shifts via Fair Consistency Regularization

    Transferring Fairness under Distribution Shifts via Fair Consistency Regularization

  • The Logic of Strategic Assets: From Oil to AI

    The Logic of Strategic Assets: From Oil to AI

  • Technology on the Margins: AI and Global Migration Management From a Human Rights Perspective (Resea...

    Technology on the Margins: AI and Global Migration Management From a Human Rights Perspective (Resea...

  • Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow

    Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow

  • Dual Governance: The intersection of centralized regulation and crowdsourced safety mechanisms for G...

    Dual Governance: The intersection of centralized regulation and crowdsourced safety mechanisms for G...

  • Theorizing Femininity in AI: a Framework for Undoing Technology’s Gender Troubles (Research Summary)

    Theorizing Femininity in AI: a Framework for Undoing Technology’s Gender Troubles (Research Summary)

  • Principios Ă©ticos para una inteligencia artificial antropocĂ©ntrica: consensos actuales desde una per...

    Principios éticos para una inteligencia artificial antropocéntrica: consensos actuales desde una per...

  • The TESCREAL Bundle: Eugenics and the promise of utopia through artificial general intelligence

    The TESCREAL Bundle: Eugenics and the promise of utopia through artificial general intelligence

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.