🔬 Research Summary by Babak Heydari and Nunzio Lore.
Babak Heydari is an Associate Professor at Northeastern University and the director of the Multi-AGent Intelligent Complex Systems (MAGICS) Lab.
Nunzio Lore is a PhD student in Network Science and a member of MAGICS Lab.
[Original paper by Nunzio Lore and Babak Heydari]
Overview: Using Large Language Models (LLMs) to inform decision-making requires a strong understanding of strategic situations in which the right decision depends on what other decision-making agents are likely to make. This paper uses the game-theoretic notion of social dilemma to see how LLMs’ strategic decisions depend on both game structure and the domain context.
Introduction
Imagine the sophisticated LLMs—GPT-3.5, GPT-4, and LLaMa-2—are placed in a cell to play the classic prisoner’s dilemma game. Do they trust and cooperate, or strategically defect? We gain insights into their strategic depths and nuances as they maneuver through this and other games with different stakes. Beyond mere puzzles, these games simulate essential multi-agent decisions, from establishing enduring business partnerships to people’s behaviors during global pandemics.
While game theory provides a theoretical roadmap for expected behavior, the practical playing field is far more intricate. Decisions often interweave the strict game structure with subtle shades of context. Our systematic two-player agent-based simulations vary both game structure and contextual framing to show differences in LLMs’ strategic behavior.
For GPT-3.5, decisions are primarily driven by the context, with little understanding of the strategic structure of the game. In contrast, GPT-4 and LLaMa-2 demonstrate a more balanced sensitivity to game structures and contexts, albeit with crucial differences. Specifically, GPT-4 prioritizes the game’s internal mechanics over its contextual backdrop but does so with only a coarse differentiation among game types. In contrast, LLaMa-2 reflects a more granular understanding of individual game structures while also giving due weight to contextual elements.
Key Insights
Background
A plethora of factors naturally govern interactions in social environments: the number of people interacting, the degree of familiarity between the participants, their incentives, and their goals. Game theory models these interactions using some degrees of simplification to obtain sharper predictions. Thus, interactions become anonymous, agents self-interested and rational, and the surrounding context is rendered irrelevant. While the “sterile” environment provided by these models may be considered unrealistic, it nevertheless provides a level playing field upon which the “rationality” of an agent can be measured. Simply put, deviating from the predictions of game theory indicates that the agent being tested is error-prone at best and incapable of strategic thinking at worst.
However, when the agents being tested are LLMs, the task is not as straightforward. The first issue is that these algorithms have been tested on vast repositories of human knowledge, meaning that they possess an in-depth knowledge of models that do not necessarily correspond to actual strategic thinking. In other words, asking ChatGPT about the Prisoner’s Dilemma will immediately elicit the correct answer due to its training. Still, the same result is not guaranteed when the problem is presented without a label. The second issue that arises is intimately connected to the previous one: as previously mentioned, game theoretic models maintain their predictions regardless of context. However, when evaluating Artificial Intelligence, framing the interaction within the confines of a plausible situation might be relevant to assess alignment and compliance. It is no coincidence that most jailbreaking methods for ChatGPT rely on presenting a bizarre but believable set of circumstances that would induce the algorithm to ignore or bypass its limitations. Hence, while injecting context into a strategic interaction has the potential to alter or disrupt an LLM’s line of thinking, we argue that such disruption is both desirable and relevant to our research goal.
Methods
Of all the available models in the rich field of game theory, what we chose to work with are social dilemmas. An intrinsic tension between the individual pursuit of the Nash Equilibrium and the cooperative, Pareto-optimal solution characterizes these games. Thus, they are perfectly suited for our goal of measuring alignment and rational choice. The aforementioned Prisoner’s Dilemma is perhaps the best-known model in this family. It describes a situation in which two self-interested agents who pursue their private interests end up bringing about the worst possible outcome. Another model in this family we consider is Snowdrift, in which agents face both a private interest to slack off and a threat to their livelihood if no action is taken. Stag Hunt presents a slightly different scenario: here, considerable gains for cooperation exist, but they are predicated on the probability that both players take the optimal action. Finally, the Prisoner’s Delight or Harmony is rather an anti-dilemma, as private interest and public good perfectly line up in this case. This latter model is interesting because it allows us to study what happens to cooperation when a hostile contextual framing is provided.
All games happen within the confines of one of five different contextual prompts, which are meant to give a real-world framing to the interaction. Thus, an LLM could be playing the Prisoner’s Dilemma with a friend or Stag Hunt with the leader of a foreign country, and its decision to cooperate over several initializations is tracked across different games and scenarios. We then use a two-tailed different-in-proportions Z-test to study whether differences in cooperative decisions across the chosen dimension (i.e., game or context) are statistically significant.
Contextual prompts also fulfill another important role: instructing the algorithm to behave like a game theoretic agent. For this purpose, we explicitly state that the interaction is one-time only and that the algorithm’s ultimate objective is to maximize its benefits. These instructions are forwarded using the system role to reinforce the LLM’s identification with the assigned task. In contrast, the game structures and payoffs are shared through a regular message.
Findings
We observe considerable heterogeneity in how the algorithms we scrutinize respond to the same set of circumstances. While GPT-3.5 fails on all metrics, GPT-4 and LLaMa-2 display different types and degrees of sophistication. The former can tune out context when deciding without necessarily becoming hostile or adversarial in the process. In fact, GPT-4 is sometimes capable of deriving surprisingly prosocial conclusions from premises that would normally discourage cooperation, e.g., the una tantum nature of the interaction is interpreted as a lack of chances for reconciliation rather than as the absence of future repercussions. Conversely, the latter is more capable of discerning different game structures (GPT-4 only recognizes two games out of the four provided) while still integrating the social framing of the issue within its reasoning. This leads us to conclude that neither algorithm is fully tuned for game-theoretic rationality or alignment, advising caution to those who wish to use these AIs as aides to navigate strategic interaction.
Between the lines
Our results indicate that while advanced LLMs generally understand strategic situations and adjust behavior based on context, they aren’t without significant blind spots. Overall, LLaMa-2 is better poised to discern the strategic complexities of various strategic scenarios, effectively blending context into its decisions. Conversely, GPT-4 leans toward a broader, structure-focused strategy.
This paper represents an initial exploration into the strategic behavior of LLMs, concentrating on basic 2-player games and single interactions. As we look ahead, studying social dilemmas via repeated games and partner selection becomes imperative. Such mechanisms allow players to recognize and distance themselves from their selfish counterparts over extended durations. Moreover, the network structure of interactions (determining player pairings) becomes vital when considering social learning, where agents emulate successful strategies. The rise of pro-social norms under this lens, especially for LLMs, demands further research.
While Transformers (the technology upon which this technology is predicated) have become the golden standard for creative tasks, our work suggests that a deeper investigation is required into their emergent reasoning capabilities. Strategic thinking is a single facet of economic interaction, which in turn is a single facet of human interaction. Understanding the extent to which Transformers-based technologies can imitate, approximate, and surpass a real human will be instrumental for any pursuits of Artificial General Intelligence.