• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

Collectionless Artificial Intelligence

January 27, 2024

🔬 Research Summary by Marco Gori and Stefano Melacci

Marco Gori and Stefano Melacci are, respectively, Full Professor and Associate Professor of Computer Science at the University of Siena (Siena, Italy), with their research focused on foundational aspects of Machine Learning, recently oriented towards problems of learning over time.

[Original paper by Marco Gori and Stefano Melacci]


Overview: Learning from huge data collections introduces risks related to data centralization, privacy, energy efficiency, limited customizability, and control. This paper focuses on the perspective in which artificial agents are progressively developed over time by online learning from potentially lifelong streams of sensory data. This is achieved without storing the sensory information and without building datasets for offline learning purposes while pushing towards interactions with the environment, including humans and other artificial agents.


Introduction

The outstanding results of machine learning-based applications are largely due to models that are trained on huge datasets. This triggers several questions about the nature of such datasets and the way they are exploited:

  • What’s inside these data collections, and who owns them? 
  • Who has the resources for developing agents that learn from these huge collections?

An artificial agent learning from a large dataset inherits biases and gains skills that are directly related to the collection’s contents. Moreover, data means “power” since owning large collections allows training large models that are then exploited in downstream applications if and only if someone has access to significant hardware and energy resources.

“Collectionless AI” identifies those approaches where intelligent agents do not need to accumulate sensory data, processing samples without storing them when they are acquired from the environment. Environmental interactions, including the information coming from humans, play a crucial role in the learning process and offer control, as well as agent-by-agent communication. We think of agents that edge computing devices can manage, and this requires thinking of new learning protocols where machines learn in a lifelong manner. 

Key Insights 

Risks connected with data centralization

The growing ubiquity of Large Language Models (LLM) has recently opened strong debates on scenarios giving rise to potentially rogue AIs involving social and political aspects. The source of these debates is deeply connected with the exploitation of increasingly large data collections, which requires huge financial resources, thus leading to the centralization of information. This aspect produces undeniable privacy problems as well as very controversial geopolitical effects.

Data centralization issues 

The progressive accumulation of data has been mostly stimulated at the dawn of the Web by technologies that have early recognized the strategic value of collecting data by massive crawling. Just like for the precious Web search services, the quality of modern machine learning-based services is strongly associated with the privilege of having access to huge data collections.

Privacy and geopolitical issues

When off-device processing data from the camera/microphone of a private smartphone, there are privacy issues in the way the information is stored and communicated through the network. As a result, on-device learning, without building databases, might become an important requirement of future AI-based technologies. When pushing collection-centered AI, we implicitly contribute to creating serious geopolitical issues connected with the domain of a few countries that can control data and the development of technologies exploiting them. 

Energy efficiency issues

Training large Transformers requires significant energy, and the training procedures are neither actively driven by the agent nor the supervisor. Differently, interacting with the agent for customizing the teaching processes would yield a more controlled setting, focusing only on what is more important for the agent at a certain instant, favoring distributed computations with brief/targeted communications among agents that might reduce the energy consumption needed to develop them.

Limited control, customizability, and causality

Data might be affected by biases that might be hard to filter out or contain inappropriate material. Differently, the progressive interaction with the environment paves the way for a more controllable and informed AI. In turn, it opens to a better exploitation of the temporal dimension of the information that can be used to capture the causal structure of the predictions better. 

Collectionless AI

A radically different perspective emerges as we think of machines that acquire cognitive skills without accessing previously stored data collections but simply by environmental interactions, where the sensory information is immediately processed, and agent-to-human or agent-to-agent exchanges occur. In nature, animals do not rely on data collections but process information as time passes and create (and update) an appropriate internal representation of the environment. It is the interaction with the environment that allows them to be in touch with the treasure of information, which enables the growth of their cognitive skills. Artificial neural networks are still struggling to find a good trade-off between plasticity and stability without relying on optimizing previously built data collections, which is the main focus of Collectionless AI. When considering the spectacular results mostly in deep learning, at first glance, we might regard the proposed Collectionless AI challenge as quite ridiculous. However, from one side, because of how intelligence emerges in nature, it is of interest in itself from a truly scientific point of view. Moreover, we also advocate its potential impact for enabling a truly different type of AI technologies, which could have a dramatic impact on society, going beyond the issues mentioned above with data collections. 

Time as the protagonist of learning

Sensory information is characterized by a natural temporal development of the data. In nature, we do not learn from a huge dataset of “shuffled images,” and animals gain visual skills without storing their whole visual life. Why can they afford to see it without accessing a previously stored database? Is there a specific biological aspect that cannot be captured in machines? This paper sustains the position that machines can likely gain those skills, once we face the challenge of learning without using data collections, exploiting the natural development of the sensory information over time.

Benchmarks in Collectionless AI

Just like humans, machines can also be expected to “live in their environment,” and they can be evaluated online. A massive online active evaluation open to a wide audience can be a viable path for qualitatively evaluating virtual agents that progressively learn, as in the case of LLMs. The quality of the same agent can be evaluated at different stages of its evolution, analyzing progress and regressions.

Between the lines

This paper proposes a new view of AI that is centered around the idea of Collectionless AI. The new learning protocol assumes that machines interact in their environment without permission to store information to re-create the typical conditions of offline learning. Machines are expected to develop their memorization skills by abstracting the information acquired from the sensors that are processed online. We argue that emphasizing the importance of this new framework might open the doors to a new approach to machine learning. Moreover, the emergence of the collectionless philosophy can contribute to better understanding of intelligence processes in nature as well as open an alternative technological path that is not centered on the privilege of controlling large data collections.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

Canada’s Minister of AI and Digital Innovation is a Historic First. Here’s What We Recommend.

Am I Literate? Redefining Literacy in the Age of Artificial Intelligence

AI Policy Corner: The Texas Responsible AI Governance Act

AI Policy Corner: Singapore’s National AI Strategy 2.0

AI Governance in a Competitive World: Balancing Innovation, Regulation and Ethics | Point Zero Forum 2025

related posts

  • Research summary: Apps Gone Rogue: Maintaining Personal Privacy in an Epidemic

    Research summary: Apps Gone Rogue: Maintaining Personal Privacy in an Epidemic

  • AI Safety, Security, and Stability Among Great Powers (Research Summary)

    AI Safety, Security, and Stability Among Great Powers (Research Summary)

  • The Impact of Artificial Intelligence on Military Defence and Security

    The Impact of Artificial Intelligence on Military Defence and Security

  • Trustworthiness of Artificial Intelligence

    Trustworthiness of Artificial Intelligence

  • Investing in AI for Social Good: An Analysis of European National Strategies

    Investing in AI for Social Good: An Analysis of European National Strategies

  • AI Ethics: Enter the Dragon!

    AI Ethics: Enter the Dragon!

  • South Korea as a Fourth Industrial Revolution Middle Power?

    South Korea as a Fourth Industrial Revolution Middle Power?

  • “Cool Projects” or “Expanding the Efficiency of the Murderous American War Machine?” (Research Summa...

    “Cool Projects” or “Expanding the Efficiency of the Murderous American War Machine?” (Research Summa...

  • Conversational Swarm Intelligence (CSI) Enhances Groupwise Deliberation

    Conversational Swarm Intelligence (CSI) Enhances Groupwise Deliberation

  • An Empirical Analysis of Racial Categories in the Algorithmic Fairness Literature

    An Empirical Analysis of Racial Categories in the Algorithmic Fairness Literature

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.