Why Looking More Closely at Our Data Is the Way to Better Research Ethics

Better data leads to better research. That might sound obvious, but in practice, it’s often not where the attention goes. In conversations about AI, digital humanities, or computational research, the spotlight usually lands on models—new tools, new methods, new technical breakthroughs. Meanwhile, the data those models rely on quietly sits in the background, treated as a given. It isn’t. We know that from the data/capta discourse. If we actually care about research ethics, we need to shift that focus. Because most of the ethical issues people worry about don’t start with the model. They start with the data. And data, especially in historical research, is messy. It’s incomplete, shaped by power structures, and full of gaps. Some voices were never recorded. Others were preserved unevenly. So when we build datasets, we’re not just collecting neutral material, we’re making choices: What gets included? What gets left out? What gets cleaned up, standardised, or ignored? This is where things get interesting—and where

read more Why Looking More Closely at Our Data Is the Way to Better Research Ethics

Guesstimating Environmental Impacts of LLM Workflows

(or: Why don’t we practice Carbon Reporting in Computational Humanities?) Oh, the debated question of whether personal or academic LLM use is environmentally problematic. Every once in a while, there’s a wave of discussion about the environmental impacts of using AI (especially large language models) in research, and, in my case, specifically in the digital humanities. Then it dies down again, and we all go about our lives as before. But I’ve been wondering: beyond having this on our “we should really look into this” list—are people actually doing anything about it? Are we practicing carbon reporting in digital humanities? Nope, not really. But I think we should be. We need to start taking action now. We need to make carbon reporting a requirement for research, just like we expect people to make their code reproducible (which, according to a recent study, isn’t working as well as one would hope, either). Awareness is not enough It’s not enough to just

read more Guesstimating Environmental Impacts of LLM Workflows