Quantifying the Carbon Emissions of Machine Learning

🔬 Research summary by Abhishek Gupta (@atg_abhishek), our Founder, Director, and Principal Researcher.

[Original paper by Alexandre Lacoste, Alexandra Luccioni, Victor Schmidt, Thomas Dandres]

Overview: As discussions on the environmental impacts of AI heat up, what are some of the core metrics that we should look at to make this assessment? This paper proposes the location of the training server, the energy grid that the server uses, the training duration, and the make and model of the hardware as key metrics. It also describes the features offered by the ML CO2 calculator tool that they have built to aid practitioners in making assessments using these metrics.

Introduction

It goes without saying that the firing of Dr. Timnit Gebru stirred the AI ethics community and highlighted some deep chasms between what is societally good when building AI systems and what serves the financial interests of organizations. In particular, the environmental impacts of large-scale AI systems was a point of concern. This research proposes some hard metrics that we can use to calculate the carbon impact of such systems, using the concept of CO2eq, a comparative metric that makes it easy to analyze carbon emissions from disparate activities. The tool created as a part of this research work is a manifestation of the desire to enable more practitioners to document their carbon impacts. They used publicly available data to provide the base values for these metrics which are populated in the tool for people to generate their own consumption metrics. The researchers found that the location of where training takes place and the hardware on which AI systems are trained have a significant impact on the total emissions and should be key considerations.

Guiding metrics and results

At the core of this paper is the idea of using CO2eq, a standardized metric used in the broader climate change community to get a handle on the carbon footprint of various activities in a way that makes comparison easier. Single metrics are not without their flaws, but for a nascent field like carbon accounting in AI systems, it is a great starting point. The first metric to calculate is the energy mix that is being utilized by the data center where the server is located. The researchers use publicly available data, assuming that the data center is plugged into the local grid where it is physically located. They find that there is a great degree of variability with some regions like Quebec, Canada having low values like 20g CO2eq/kWh to really high values of 736.6g CO2eq/kWh in Iowa, USA.

The researchers collected 4 pieces of publicly available data: the energy consumption of the hardware itself (GPU, CPU, etc.), location of the hardware, the region’s average CO2eq/kWh emission, and potential offsets purchased by the cloud provider. Keeping these factors in mind, the researchers urge practitioners to choose cloud providers wisely since different levels of renewable energy certificates and carbon offsets purchased by them have an impact on the final output. The power usage effectiveness (PUE) of the infrastructure of different cloud providers also changes the output of the calculation. The PUE is a measure of how much overhead is expended for every cycle of useful computation. In addition, as highlighted before, choosing the right region for training your model also has a significant impact, sometimes to the order of 35x as demonstrated above.

Potential solutions and caveats

The AI research journey is not without failed experiments and false starts. We are referring here to different experiments that are run by changing different architectures and hyperparameter values. But, there are efficient methodologies to do so: for example, one can use randomized search for finding optimal values compared to grid search which does so in a deterministic manner and has been shown to be suboptimal. Finally, specialized hardware like GPUs are demonstrably more efficient than CPUs and should be factored in making a decision.

Taking all of these factors into account, the researchers also urge the community to weigh the impacts that such changes might have. In particular, a dramatic shift to low-carbon intensity regions can lead to unutilized capacities elsewhere leading to emissions regardless of usage. In making these calculations, there are a lot of assumptions used since we don’t have complete transparency on the actual carbon emissions of data centers and the associated energy mixes for the grids that they draw from. Also, the tool is just focused on the training phase of the AI lifecycle, but repeated inference can also add up to have a sizable impact on the final carbon footprint.

Between the lines

The findings in this paper are tremendously useful for anyone who is seeking to address the low-hanging fruits in reducing the carbon footprint of their AI systems. While the underlying data is unfortunately static, it does provide a great first step for practitioners to get familiar with the ideas of carbon accounting for AI systems. The next iteration of this tool, dubbed CodeCarbon, moves closer to what the practitioners’ community needs: tools that are well-integrated with the natural workflow. The original formulation was a web-based tool that introduced friction and required the data scientists to manually enter information into the portal. The newer iteration has the advantage of capturing metrics just as is the case with other experiment tracking tools like MLFlow, enabling potentially higher uptake in the community.

Quantifying the Carbon Emissions of Machine Learning

Introduction

Guiding metrics and results

Potential solutions and caveats

Between the lines

Research Summary: Risk Shifts in the Gig Economy: The Normative Case for an Insurance Scheme against...

The Challenge of Understanding What Users Want: Inconsistent Preferences and Engagement Optimization

FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning Pipelines

Self-Consuming Generative Models Go MAD

A Virtue-Based Framework to Support Putting AI Ethics into Practice

Cinderella’s shoe won’t fit Soundarya: An audit of facial processing tools on Indian faces

Responsible and Regulatory Conform Machine Learning for Medicine: A Survey of Challenges and Solutio...

SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models

A Holistic Assessment of the Reliability of Machine Learning Systems

Two Decades of Empirical Research on Trust in AI: A Bibliometric Analysis and HCI Research Agenda

About Us

Introduction

Guiding metrics and results

Potential solutions and caveats

Between the lines

Research Summary: Risk Shifts in the Gig Economy: The Normative Case for an Insurance Scheme against...

The Challenge of Understanding What Users Want: Inconsistent Preferences and Engagement Optimization

FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning Pipelines

Self-Consuming Generative Models Go MAD

A Virtue-Based Framework to Support Putting AI Ethics into Practice

Cinderella’s shoe won’t fit Soundarya: An audit of facial processing tools on Indian faces

Responsible and Regulatory Conform Machine Learning for Medicine: A Survey of Challenges and Solutio...

SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models

A Holistic Assessment of the Reliability of Machine Learning Systems

Two Decades of Empirical Research on Trust in AI: A Bibliometric Analysis and HCI Research Agenda

Footer

About Us