The carbon footprint of DL: "Energy and Policy Considerations for Deep Learning in NLP" (Paper)

The youth climate movement is on the news everywhere currently (at least in Europe). Some researchers in the DL community also start thinking about the impact on energy consumption and CO2 emissions that this field has.

I think this paper published 2 days ago makes some very interesting and important points that are worth thinking about:

1 - The energy and climate implications of deep learning training.
While the paper uses the newer NLP architectures and huge amounts of data as examples, the implications remain even when training smaller models. (Just think of the thousands of people on kaggle and the amount of GPU power used)
I am not sure that the measurement methodology is actually very precise, but even taking them as ballpark estimates shows the importance of being more aware of the impact and the need to strive for higher efficiency.

2 - The implications of making NLP research too expensive for academics.
It has become so resource intensive in terms of compute (= cost) that academics cannot “compete” anymore with corporate research. The authors advocate publicly financed compute cloud for academics as a possible mitigation strategy.


Recent progress in hardware and methodology for training neural networks has ushered in a new generation of large networks trained on abundant data. These models have obtained notable gains in accuracy across many NLP tasks. However, these accuracy improvements depend on the availability of exceptionally large computational resources that necessitate similarly substantial energy consumption. As a result these models are costly to train and develop, both financially, due to the cost of hardware and electricity or cloud compute time, and environmentally, due to the carbon footprint required to fuel modern tensor processing hardware. In this paper we bring this issue to the attention of NLP researchers by quantifying the approximate financial and environmental costs of training a variety of recently successful neural network models for NLP. Based on these findings, we propose actionable recommendations to reduce costs and improve equity in NLP research and practice.

This post is just to start up some discussion, what is your take on this?


More incentive to build models that train fast, and test ideas on smaller datasets such as Imagenette/Imagewoof.


The paper has made it into Andrew Ng’s weekly newsletter. Here is their take on things:

The debate: The conclusions sparked much discussion on Twitter and Reddit. The researchers based their estimates on the average U.S. energy mix. However, some of the biggest AI platforms are far less carbon-intensive. Google claims its AI platform runs on 100 percent renewable energy. Amazon claims to be 50 percent renewable. The researchers trained on a GPU, ignoring the energy efficiency of more specialized chips like Google’s TPU. Moreover, the most carbon-intensive scenario cost between $1 million and $3 million — not an everyday expense. Yes, AI is energy-intensive, but further research is needed to find the best ways to minimize the impact.
Everything is political: Ferenc Huszár, a Bayes booster and candidate for the International Conference on Machine Learning’s board of directors, tried to take advantage of the buzz. He proposed “phasing out deep learning within the next five years” and advised his Twitter followers to “vote Bayesian” in the ICML’s upcoming election. ¯_(ツ)_/¯
We’re thinking: This work is an important first step toward raising awareness and quantifying deep learning’s potential CO2 impact. Ever larger models are bound to gobble up energy saved by more efficient architectures and specialized chips. But the real issue is how we generate electricity. The AI community has a special responsibility to support low-carbon computing and sensible clean energy initiatives.

1 Like

Here’s a very high level response…

I like the heart of the paper. In a sense, any information sharing that tries to help users connect what they’re doing to the actual impact of their actions is a positive contribution to the greater community.

For example, I’ll eat a burger with the best of 'em, but I know it doesn’t just come from a grocery store. I’m aware that it was once an animal. That it lived, died, and deserves (at a minimum) acknowledgement for the sacrifice.

I’m aware that when I throw something away, it doesn’t just disappear, it ends up in the ground, taking up space somewhere and potentially causing an impact.

I’m aware that, when I drive, the exhaust from my vehicle is trapped in the atmosphere. That it has some small, detrimental influence on the quality of the air around me.

Because of this, I do my best to live a balanced life, to eat cleaner, to waste less, and to drive responsibly. The thing is, I’m far from perfect, and I still do these things, but the increased awareness does nudge me in the right direction.

While going through the fastai course, I haven’t thought about the greater impact of my GPU usage one bit. I click a button, a server spins up, and black magic ensues. There’s a disconnect between my actions and my awareness of the impact they are causing. While the scale of that impact might be debate-able, the reality of that impact is not. I appreciate the fact that someone has taken the time to ask me, “have you thought about who that mouse click effects?”

Knowing how my actions effect others connects me to the greater community, creates more space for empathy, and nudges my negative tendencies in the right direction. I think that’s pretty cool…


Maybe this paper can provide the balance, i.e. potential positive impacts from machine learning…

1 Like