(or: Why don’t we practice Carbon Reporting in Computational Humanities?)

Oh, the debated question of whether personal or academic LLM use is environmentally problematic. Every once in a while, there’s a wave of discussion about the environmental impacts of using AI (especially large language models) in research, and, in my case, specifically in the digital humanities. Then it dies down again, and we all go about our lives as before. But I’ve been wondering: beyond having this on our “we should really look into this” list—are people actually doing anything about it? Are we practicing carbon reporting in digital humanities?

Nope, not really. But I think we should be. We need to start taking action now. We need to make carbon reporting a requirement for research, just like we expect people to make their code reproducible (which, according to a recent study, isn’t working as well as one would hope, either).

Awareness is not enough

It’s not enough to just be aware that somebody has pointed out the environmental problems with large language models and AI, and to periodically reiterate and decry how horrible it is, but then continue as before.

There is actually a history of these concerns being discussed within digital humanities discourse and beyond; we just don’t draw on them much. We all know the famous 2021 Stochastic Parrots paper. It explicitly foregrounds environmental concerns alongside issues of bias and dataset construction. And yet, in practice, most people cite it for the latter and quietly ignore the former. More broadly, there is a long history of discussing environmental impacts in and around digital humanities. But we don’t draw on that too much either.

Given the current scale of AI adoption, this is becoming increasingly difficult to justify. Simply reiterating that AI is problematic and then continuing as before is, frankly, a bit dishonest. If we take these concerns seriously, we need to start acting on them. That means: making carbon reporting part of our research practice, just like we (at least in theory) expect reproducibility. (Even that is, evidently, not going too well either.)

Working with “guesstimates”

But now that I have your attention: How does carbon reporting even work? We usually measure or quantify environmental impacts using what we can call “guesstimates,” because it’s really hard to measure these things. Of course, you can look up exactly how it’s really done in the materials I’ll provide later on but know that the concept of the “guesstimate” helped me a lot in understanding the difficulty of accurately measuring energy and water consumption associated with AI and data centres. The 2004 book The Hitchhiker’s Guide to Lifecycle Assessment references the pop culture classic with its idea that the answer is 42, but we don’t know the question. That captures the issue quite well. We can always produce numbers, but it’s often unclear exactly what they represent.

This is not a flaw of individual studies, it is a structural problem because measuring impacts across complicated tech supply chain cycles is hard. Not just because of the many variables (they are a problem) but also because there’s a significant amount of opacity involved (as in people greenwashing, openwashing (or here) or simply sweeping under the rug where their old tech parts are going after the end of their lifecycle. Hint: Landfills in the Global South is a likely candidate.)

This overall assessment also reflects my own (admittedly humble and superficial) experience in this space. Most of what we do in carbon reporting for AI are guesstimates. There are many assumptions involved, and small changes in these assumptions can lead to very different results. Hank Green has explained this nicely: tracing the full environmental cost of any system is extremely complex – that’s why different studies can produce wildly different numbers without necessarily being wrong. This doesn’t make the exercise useless but it does mean we need to be careful in how we interpret and communicate these results.

Our attempt: carbon reporting in a DH use case

This is exactly the challenge we encountered in our own working paper.

We set out to quantify the environmental footprint of using large language models in a digital humanities workflow, i.e. in the context of the CORAL project, where we worked on curating cross-collection oral-history datasets. We compared manual screening of 2,606 interviews with an LLM-assisted workflow using several instruction-tuned models and different prompt designs. To estimate environmental impact, we used token-based calculations via EcoLogits. But implementing this in practice was far from straightforward. In fact, we couldn’t do everything we initially planned because it turned out to be much harder than expected. There are many tools out there, but they don’t always transfer well into actual research setups. So we ended up building a small wrapper ourselves, inspired by existing approaches. We explain our assumptions transparently and draw on existing tools and literature, but these results should always be read with caution.

Still, all these carbon reporting approaches are guesstimates. So they’re estimates, but really more like guesses. Take it all with a grain of salt. We explain how we arrive at these numbers. We drew on what the people who created these tools are doing, assuming they know best. We looked at the most relevant literature, both in digital humanities and in environmentally conscious LLM or AI development. So this paper is a one-stop shop for what you need to know. But still only guesstimates.

But now, without further ado, here’s our working paper: Lang, Sarah, Wishyut Pitawanik, Pascal Belouin, Emma Sevink, Jesse Olszynko-Gryn, Alfred Freeborn, and Etienne Benson. “Quantifying the Environmental Footprint of Curating Datasets with LLMs”. Zenodo, December 11, 2025. https://doi.org/10.5281/zenodo.17902822.
And here are some slides: Lang, Sarah. “Quantifying the Environmental Impacts of Large Language Models (LLMs) in Digital Humanities Through Carbon Reporting”. February 17, 2026. https://doi.org/10.5281/zenodo.18671833.

The slides are from a talk I gave about this and provide an overview of the environmental impacts of using AI, particularly large language models, in research. Like this blogpost (that was inspired by that talk I gave), the slides begin by briefly addressing the debated question of whether personal or academic LLM use is environmentally problematic. The talk situates these concerns within digital humanities discourse and highlights how influential AI-critical works, such as the 2021 Stochastic Parrots paper, foreground environmental issues. The talk also contextualises the current AI boom, drawing on the book AI Snake Oil, and refers to Kate Crawford’s critical work, including Atlas of AI, Anatomy of an AI System, ImageNet Roulette, and recent reporting on AI resource demands in a New York Times video feature. (All links and references can be found in the slides.)

Context matters: comparing impacts

Another key insight from our research into this topic is that environmental impact numbers only make sense in context. They need to be understood relationally. For example, I recently listened to a highly recommended podcast discussion on this topic (in German, unfortunately). The discussants bring up how it took place via video call. That consumes energy but it also replaced intercontinental travel. In that context, the digital option likely saved a significant amount of emissions despite consuming (negligible amounts of) energy itself. The same logic applies to our research practices.

(We also need to look at these impacts through lifecycle assessment (LCA) but that’s a topic for another day and frankly, not my strongest suit.)

If we use LLMs (or other AI tools), we should ask:

What are we not doing instead?
What work is being replaced?
Is it really worth it?

That’s what we tried to address in our working paper. The reviewers didn’t get it at all, by the way.

Why a working paper?

If all of this is so important, why is it “just” a working paper? I’m glad you asked. We created this in June–July 2027 as a submission to the 2025 Computational Humanities Research conference. While our results were not very bad, we received relatively low scores that we felt were not substantiated by the comments / justifications given for the ratings. The three reviewers didn’t agree on what they didn’t like (everybody disliked something different) and we thought none of it justified the low overall recommendation scores.

What they all seemed to agree on, however, is that this is not really a topic highly relevant to the Computational Humanities Research Conference. While this may have something to do with different interpretations of what digital and computational humanities encompass, I find this… let’s say… rather interesting and wonder what it says about the state of the field. If environmental impact is not considered relevant to computational humanities, then what does that say about us? …and I guess I get to be a little bitter about not being accepted with a paper that I thought was important.

Maybe it was just bad luck with the reviewers but the paper had also been written explicitly with that audience in mind and rewriting it for resubmission elsewhere would have been too complicated. In any case, we decided not to go through with it. The topic is evolving too quickly, and we didn’t want the work to become outdated. So we published it as a working paper instead (reflecting the state of our research on the topic in July 2025). We thought that if somebody likes it, they can still access it like that and publishing it on the GreeningDH working group’s Zenodo group supports that group as well. Everyone can get the information if they want and that’s what matters.

Maybe the paper is too practical in many ways to be fully respected as a journal article. In any case, we didn’t want to put in the work to change it completely, because that would take too much time and it would go out of date in the meantime. If you’re interested in continuing this work, drop me a note, share this on social media, and cite it in your publications.

And most importantly: use it to do some preliminary carbon reporting. It’s the very least we can do. As you will see when you read the article, energy consumption is only one small aspect of sustainability in our practices. And yes, it should be a more central topic.

Takeaways and Conclusions

The main takeaway is not that we now have perfect measurements. We don’t. What we have are approximations, uncertainties, and a lot of open questions. But that’s not a reason to do nothing. If anything, it’s a reason to start being more transparent about what we do, how we measure it, and where the limitations lie. Because even imperfect measurements can be useful as I’m also always saying in all my talks on data gaps. As the often-cited phrase goes: “What gets measured gets managed.” (Peter Drucker)

At the same time, we should be careful not to fall into the trap of thinking that having measurements alone is enough. We need to question whether increased awareness and visibility alone are sufficient, or whether they risk fostering complacency without leading to substantive change. Increased awareness, after all, does not automatically lead to change, as Rutger Bregman argues in Moral Ambition. In some cases, it may even create a false sense of having addressed the problem.

Also, just so you know and in case you were wondering: I am not a climate activist. I had no previous experience, knowledge or training in this before we wrote this paper. Yes, I’m not unfamiliar with activism but that’s usually about data ethics or data gaps. I had never done anything to do with environmental concerns before. If I can do it, so can you. No excuses. And listen to the many practioners who have been thinking about this for much longer than me and have much more experience. I think they can be found in the sustainable RSE corners of relevant DH social media.

Even if we get better at measuring energy consumption, that is only one part of the picture. Sustainability also involves questions of infrastructure, access, labor and power. And sometimes, it comes down to the simple principle of sustainability: Using what we actually need and not more. We don’t necessarily have to stop using LLMs altogether. But we should be more deliberate about how we use them, and more transparent about their costs and limitations.

It’s weird how even the DH are subconsciously adopting these “AI is oh so great, let’s not waste our time on unproductive critical work” narratives from the tech sector. I would have hoped we were more critical than that. Heck, if we aren’t, why are we even calling ourselves “Humanities”? Ok, rant over. But honestly, we need to get back into the habit of being AI-critical. That doesn’t mean becoming luddites. But we need to stop absorbing those either-or narratives from the AI field that makes you believe you can’t be both interested in exploring the capabilities and field-specific uses of AI and at the same time holding people accountable. I find this anti-critical stance (that I see secretly being on the rise even in DH lately) to be quite concerning. Let’s do better from now on.

Carbon reporting is not the solution to all our problems but it is a first step. A small one, perhaps. An imperfect one, certainly. But necessary and honestly, the least we can do. Let’s start a carbon reporting revolution in DH! If not via the platform of CHR, then let’s at least go grassroots.

And, fitting with the motto of The Hitchhiker’s Guide to Lifecycle Assessment, erm, I mean the Galaxy (of course):
“Thanks for all the fish” and see you next time.
The Ninja

Buy me coffee!

If my content has helped you, donate 3€ to buy me coffee. Thanks a lot, I appreciate it!

€3.00

LaTeX Ninja'ing and the Digital Humanities

The verb "to ninja" means "to act or move like a ninja, particularly with regard to a combination of speed, power, and stealth." LaTeX adventures, demystifying digital tools for Humanists, one tutorial at a time.

Guesstimating Environmental Impacts of LLM Workflows

(or: Why don’t we practice Carbon Reporting in Computational Humanities?)

Awareness is not enough

Working with “guesstimates”

Our attempt: carbon reporting in a DH use case

Context matters: comparing impacts

Why a working paper?

Takeaways and Conclusions

Leave a comment Cancel reply

(or: Why don’t we practice Carbon Reporting in Computational Humanities?)

Awareness is not enough

Working with “guesstimates”

Our attempt: carbon reporting in a DH use case

Context matters: comparing impacts

Why a working paper?

Takeaways and Conclusions

Share this:

Related

Leave a comment Cancel reply