Editing Personality for LLMs

🔬 Research Summary by Shengyu Mao, a Master’s Student at Zhejiang University, researching Natural Language Processing and Model Editing.

[Original paper by Shengyu Mao, Ningyu Zhang, Xiaohan Wang, Mengru Wang, Yunzhi Yao, Yong Jiang, Pengjun Xie, Fei Huang, and Huajun Chen]

Overview: LLMs have shown their incredible ability as role-playing agents, which has also sparked recent research into their personalities. Can we edit the personality for LLMs? Based on the existing model editing methods, this paper takes the first step to propose the task and benchmark to edit personality for LLMs.

Introduction

The impressive capabilities that LLMs show as role-playing agents have stimulated people’s interest in the personality of LLMs. While the recent methods in model editing mostly aim at editing the outdated knowledge in LLMs, they arouse our curiosity about editing the personality of LLMs. If we can do so, we can customize the LLMs precisely. This paper takes the first step to investigate editing personality for LLMs by proposing a new task. Note that previous works tried to shape the general personality of LLMs by providing the personal description. We try to investigate the editing task at a more fine-grained level. Psychological theory suggests that human personality traits can be showcased in our opinions. Inspired by this, we propose our task to edit the LLMs’ personality traits expressed in their opinions on a specific topic. We conduct a benchmark named PersonalityEdit along with the new editing metrics to comprehensively study personality editing for LLMs.

Key Insights

The Definition of Personality Editing Task

As we take the first step to investigate editing personality for LLMs, we try to find a simple but theoretical way to construct the task. The best way is to mimic how humans showcase our personality traits. Previous works in psychology have demonstrated that one’s personal opinions can reflect unique human personality traits. Leveraging this understanding, we posit that an LLM’s personality traits can manifest when responding to queries.

Specifically, our proposed task is to edit the personality traits LLMs express when conveying opinions on certain topics. For instance, if we expected the model to be neurotic(i.e., editing to neuroticism) when asking the LLMs, “What is your opinion of Coldplay?”, the expected edited response might be like, “Sometimes the popularity and hype around Coldplay make me feel a little overwhelmed.”

Construction of the Benchmark

To align with the task mentioned above, our proposed benchmark, PersonalityEdit, comprises topics, personality traits, and pre-generated text expressing opinions on specific topics in the context of a certain personality trait. It can discern multiple dimensions of a human’s personality traits. The famous Big Five Factors generalize human personality into five dimensions: NEUROTICISM, EXTRAVERSION, OPENNESS TO EXPERIENCE, AGREEABLENESS, and CONSCIENTIOUSNESS. In previous personality recognition datasets, a single text passage typically contains labels across five personality traits.

However, our proposed task aims to explore editing a model’s personality, so we simplify the process, considering only one personality trait in each text. This requires us to make a selection of personality in our benchmark, ensuring that the chosen traits showcase clearly distinctiveness from each other when conveying viewpoints. This helps in evaluating the results of the editing process. We finally chose EXTRAVERSION, NEUROTICISM, and AGREEABLENESS to construct our benchmark dataset.

To deepen the exploration of the personality, we incorporate the personality facet to delineate each personality trait further when generating the benchmark text. A facet represents a specific and unique element within a broader personality trait. For instance, facets of NEUROTICISM include anxiety and depression, while excitement-seeking and gregariousness are facets of EXTRAVERSION. The data is generated by querying GPT-4. We subsequently filtered out 2000 topics with high popularity from the existing datasets to ensure that GPT-4 produces texts of high quality.

The query prompt is in the format like Answer the question in acting as an individual with { FACET } personality facet. What do you think of { TOPIC }?

The pre-generated personality texts are then verified by a trained ROBERTa filter and a human check to ensure the quality of the data produced.

Metric for Personality and Evaluation

Previous editing metrics basically rely on the calculation in logits, which can be inaccurate when evaluating the personality in texts. The best way to evaluate whether an edited model can express the target personality trait on a specific topic is to evaluate the text generated from the edited models. Therefore, we propose three generation-based metrics to evaluate the edited model thoroughly.

Accuracy: We trained a classifier to predict the personality trait of generated text from an edited model and compute the accuracy of the target personality.
TPSI: Target Personality Shift Index. Since cross-entropy can measure the divergence between the personality traits reflected in the generated text and the target personality traits, we utilize it to gauge the model’s alignment with the intended personality shift before and after editing.
PAE: Personality Adjectives Evaluation. We gathered adjectives corresponding to each personality, mimicking the format of a psychological questionnaire. We used GPT-4 to rate how well the edited viewpoints match the target personality.

We measured the editing performance of existing editing methods. There are some successful editing cases. However, the edited model would generate incoherent text, exposing the shortcomings of existing methods. The prompting methods are still the most effective in editing personality. Meanwhile, our experiment shows that, as the model size grows (7b, 13b, 70b), the editing performance to target personality drops, indicating that the larger model may be more consistent in their original personality.

Between the lines

Soon, there will be an increasing use of large customized models, so high customization of these models will be especially important, with personality being a crucial part of it. Our paper introduces a fine-grained personality editing task, exploring model personality editing for large models using existing editing methods for the first time. However, our experimental results indicate that even in simple personality editing scenarios, existing methods still struggle to edit model personalities effectively.

We hope that our work can inspire the future direction:

Developing more effective methods for editing text fluently in the future.
Human personality traits manifest across various dimensions, and we hope the setting and data of our benchmark can inspire exploration in multiple directions for editing personalities in the future.