• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting

December 14, 2023

šŸ”¬ Research Summary byĀ Griffin Adams, a final year NLP PhD student at Columbia University under NoĆ©mie Elhadad and Kathleen McKeown, who will be starting as the Head of Clinical NLP for Stability AI in 2024.

[Original paper by Griffin Adams, Alexander R. Fabbri, Faisal Ladhak, Eric Lehman, and NoƩmie Elhadad]


Overview: Selecting the ā€œrightā€ amount of information to include in a summary is difficult: a good summary should be detailed and entity-centric without being overly dense and hard to follow. To better understand this tradeoff, we solicit increasingly dense GPT-4 summaries with what we refer to as a ā€œChain of Densityā€ (CoD) prompt. Specifically, GPT-4 generates an initial entity sparse summary before iteratively incorporating missing salient entities without increasing the length.


Introduction

Automatic summarization has come a long way in the past few years, largely due to a paradigm shift away from supervised fine-tuning on labeled datasets to zero-shot prompting with Large Language Models (LLMs), such as GPT-4 (OpenAI, 2023). Careful prompting can enable fine-grained control over summary characteristics, such as length, topics, and style, without additional training. An overlooked aspect is the information density of a summary. Theoretically, as a compression of another text, a summary should be denser–containing a higher concentration of information–than the source document. Given the high latency of LLM decoding, covering more information in fewer words is a worthy goal, especially for real-time applications. Yet, how dense is an open question. A summary is uninformative if it contains insufficient detail. However, if it contains too much information, it can become difficult to follow without increasing the overall length. Conveying more information subject to a fixed token budget requires a combination of abstraction, compression, and fusion. There is a limit to how much space can be made for additional information before becoming illegible or even factually incorrect.

Key Insights

In this paper, we seek to identify the optimal balance between detail and readability by using GPT-4 to generate increasingly entity-dense (e.g., detailed) summaries and have humans provide preference assessments. 

The Prompt

The Chain of Density (CoD) prompt is achieved with a single prompt to GPT-4, which is tasked with writing five summaries of a provided article. At each step, 1-3 additional details (entities) are added to the previous summary without increasing the length. Existing content is re-written to make room for new entities (e.g., compression, fusion).

The Data

We randomly sample 100 articles from a CNN/DailyMail news article collection.

Human Feedback

We conduct a human evaluation to assess the impact of densification on human assessments of overall quality. Specifically, the first four authors of the paper were presented with randomly shuffled CoD summaries, along with the articles, for the same 100 articles (5 steps * 100 = 500 total summaries). Based on the same definition of a ā€œgood summary,” each annotator indicated their top preferred summary.Ā  Our results indicated that humans prefer summaries that are almost as dense as human-written summaries and more dense than summaries generated from a simple GPT-4 prompt: ā€œWrite a VERY short summary of the Article. Do not exceed 70 words.ā€

Between the lines

We study the impact of summary densification on human preferences for overall quality. A degree of densification is preferred, yet it is very difficult to maintain readability and coherence when summaries contain too many entities per token. We open-source annotated test sets and a larger unannotated training set for further research into the topic of fixed-length, variable-density summarization.  Future work should identify the optimal information level to include for each unique article.  Given the rise of open-source LLMs (LLama, Mistral), this expensive Chain of Density prompt could be distilled into a single model through fine-tuning.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

šŸ” SEARCH

Spotlight

ALL IN Conference 2025: Four Key Takeaways from Montreal

Beyond Dependency: The Hidden Risk of Social Comparison in Chatbot Companionship

AI Policy Corner: Restriction vs. Regulation: Comparing State Approaches to AI Mental Health Legislation

Beyond Consultation: Building Inclusive AI Governance for Canada’s Democratic Future

AI Policy Corner: U.S. Executive Order on Advancing AI Education for American Youth

related posts

  • Anthropomorphic interactions with a robot and robot-like agent

    Anthropomorphic interactions with a robot and robot-like agent

  • A survey on adversarial attacks and defences

    A survey on adversarial attacks and defences

  • Private Training Set Inspection in MLaaS

    Private Training Set Inspection in MLaaS

  • Labor and Fraud on the Google Play Store: The Case of Install-Incentivizing Apps

    Labor and Fraud on the Google Play Store: The Case of Install-Incentivizing Apps

  • Moral Machine or Tyranny of the Majority?

    Moral Machine or Tyranny of the Majority?

  • Governance by Algorithms (Research Summary)

    Governance by Algorithms (Research Summary)

  • Rethinking Gaming: The Ethical Work of Optimization in Web Search Engines (Research Summary)

    Rethinking Gaming: The Ethical Work of Optimization in Web Search Engines (Research Summary)

  • Explainable artificial intelligence (XAI) post‐hoc explainability methods: risks and limitations in ...

    Explainable artificial intelligence (XAI) post‐hoc explainability methods: risks and limitations in ...

  • The Impact of the GDPR on Artificial Intelligence

    The Impact of the GDPR on Artificial Intelligence

  • Down the Toxicity Rabbit Hole: Investigating PaLM 2 Guardrails

    Down the Toxicity Rabbit Hole: Investigating PaLM 2 Guardrails

Partners

  • Ā 
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • Ā© 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.