🔬 Research Summary by David Lindner, a doctoral student at ETH Zurich working on reinforcement learning from human feedback.
[Original paper by David Lindner, Mennatallah El-Assady]
Overview: Current work in human-in-the-loop reinforcement learning often assumes humans are noisily rational and unbiased. This paper argues that this assumption is too simplistic and calls for developing more realistic models that consider human feedback’s personal, contextual, and dynamic nature. The paper encourages interdisciplinary approaches to address these open questions and make human-in-the-loop (HITL) RL more robust.
In the past few years, we’ve seen a lot of impressive demonstrations of Reinforcement Learning (RL). But most of them focus on simulations with a well-specified goal, which includes video games and board games.
But what happens in practical applications where, instead of a simulation, a human provides feedback to RL agents? Then the agent has to model the human’s decision-making process and adapt to it during training.
So far, most methods in RL assume that humans are noisily rational and, importantly, unbiased. We argue that this is an oversimplification and that RL algorithms need more realistic models of human interaction to be effective in real human-in-the-loop scenarios.
The paper reviews common approaches to handling human feedback in RL and discusses key research challenges and opportunities in three categories: personalization, contextualization, and adaptiveness.
Collaborative Human-AI Interaction
First, let’s review different types of human-AI interaction occurring in RL applications. For our discussion, we distinguish three categories of interaction: instruction, evaluation, and cooperation.
A human can instruct the agent by telling it what to do. This is common in imitation or inverse reinforcement learning, where the agent learns a task from human demonstrations. For example, a human expert might teleoperate a robotic arm to generate these demonstrations. But instruction can also occur in more subtle forms, for example, if a human physically corrects a robotic arm’s trajectory.
A human can provide evaluations by telling the agent how well it is doing. This is common in reinforcement learning from human feedback (RLHF), where humans, for example, decide which of multiple trajectories they prefer. There are also implicit forms of evaluation, for example, user engagement in recommender systems.
Many practical applications need more complicated forms of human-AI interactions, which are best modeled as cooperation. For example, a human worker and a robot might work together to assemble a product and learn from each other.
Challenges in Human Feedback Modelling
Current models of human feedback in RL are very simple. They usually assume a human user has fixed preferences and takes actions rationally with some constant noise model. Our paper highlights unsolved challenges to modeling real human feedback along three dimensions: feedback must be modeled as personal, contextual, and adaptive.
Different humans will interact differently with RL systems, and we cannot hope to find a universal model of this interaction. The interaction will, for example, depend on personality factors and prior knowledge of the task and the system. To build accurate models of human feedback, we must take these factors into account.
Beyond personal factors, we must also take the (sociotechnical) context into account how a human interacts with an RL agent will depend on when and where the interaction is happening. For example, a medical doctor might be more careful about evaluating the recommendations of an AI assistant than the average user of a personal assistant on their smartphone. Such contextual effects also interact with personality effects and have to be modeled together.
During an interaction, both the user and the AI system will accumulate knowledge that changes how they interact with each other. Other factors, such as the user’s energy and level of motivation, might also change during the interaction. To accurately model human-AI interaction, we must measure, predict, and adapt to such changes.
Implications for RL
To enable RL systems to learn from and adapt to different people and contexts, we must model the interaction along these three dimensions and design RL systems accordingly.
We need to ensure the whole interaction is personalized and considers each user’s preferences. We must ensure that the system design is appropriate for the interaction context. And we need to measure and predict changes in the interaction and design systems that adapt to these changes.
Moving to more realistic models of human feedback will likely make the corresponding RL problems more difficult. However, a more realistic approach to modeling humans will make the RL algorithms we design more robust and applicable to practical situations.
Importantly, modeling human feedback is more than just a technical challenge. Some of the most important open research questions we discuss in our paper must be answered from a human-centered and interdisciplinary perspective.
Between the lines
Human-in-the-loop RL aims to design systems that can interact with real humans in real-world situations. Current research toward this goal is held back by using too simplistic human models. The research questions in this paper pose a difficult interdisciplinary research challenge. But solving this challenge will help us to move from RL systems that work well with simulated humans and in very basic user studies to RL systems that can interact with humans in the real world.
We plan to start an interdisciplinary discussion around these topics, which allows reinforcement learning researchers to look beyond the immediate technical problems they are solving and move towards designing and deploying human-centered RL systems.