Don’t call it a database!

When I started this blog, one of my promises and goals, apart from LaTeX-Ninja’ing, was to demystify the Digital Humanities for non-DH people. For a long time I have watched and I think one of the big mysteries of the DH still persists in Normal Humanists’ heads and thus, really needs demystifying. You might have guessed it, I want to explain why DH people will cringe if you call digital resources ‘databases’ which are not, technically speaking, databases.

Is it ok to call any digital resource / corpus a ‘database’?

We know, that’s what you tend to call a digital corpus. But in most cases it’s not correct, it’s a pars pro toto. A database is just one possible technical implementation, but the term is used more broadly for any ‘digital base of data’. By laypeople, at least. A pars pro toto stylistic device is a Humanities’ thing, right? You do get stilistic devices. So you can also understand why you shouldn’t use imprecise terminology. You don’t like it when people misuse your fields’ terminology either and probably make quite a religion about it.

If you want to work with the DH, you need to understand their terminology and respect it by using it correctly. Even though it might initially feel unintuitive to you. Believe me, you will adapt quickly if you give it a try.

I’ve caught myself so many times now, educating my Normal Humanist friends about digital resources and why my (DH) colleagues won’t take you, as a Humanist, very seriously if the word “database” slips out your mouth at inappropriate moments. It’s kind of like the Tourette’s of NH-moving-in-a-DH-world. Which probably is not a politically correct analogy. No offense to people who actually have Tourette’s, I don’t want to devalue or disrespect your struggle in any way! It’s just analogical in the way of spluttering out inappropriate words at inappropriate moments.

Everybody has their cringe-prone terminology item, right?

To be honest, I am not sure how strict the English speaking DH world is in this, but I can guarantee you that this distinction is very valid concerning the German language use of “Datenbank”. When a quick web search yielded this result, I wasn’t sure anymore if it’s actually a thing in English too. Digital Humanities at King’s College define a database as follows:

Database is the term we use for any large collection of online material.

( )

This, however, is exactly the way I don’t suggest you use this term. I am aware that this is the association linked with it in many people’s minds. But hey, you are Humanists. You do have a sense for the intricacies of terminologies, right? I, for one, really hate it when people use the wrong gender on the term corpus (in German: neutral (!) for a collection of documents, so always neutral, unless you mean an actual body like that of a musical instrument). You probably have a thing like that, too, where you get furious at laypeople saying it wrong, don’t you? Well, the DH equivalent of this thing is the misuse of the term database.

Using terminology correctly is a sign of respect towards the DH community. It shows you respect us as researchers and don’t think of us as the ‘idiot who does the tech stuff’

Well, to be exact, it’s not even a misuse. You sure can use the term database in this way and it’s not, strictly speaking, completely incorrect. It’s just misleading, and – most importantly as the subject of this post – it is a strong pointer to the fact that you are not very tech-savvy and either unaware or else disrespectful of digital terminology. It will be seen as either a lack of respect and esteem towards the digital field or, I don’t know which is worse, a lack of competence in general. You would deem it impolite, too, and probably take it as a sign of general incompetence or lack of intellecutal ability/openness  if a DH person came along and persistently misused your terminology, right?

Edit/addition 2019/06/04: I think this issue is less about whether it is technically or theoretically correct to use a term like this or like that. It’s a question of being ‘politically correct’ and of not hurting people’s feelings. To show the point on an extreme example (which is maybe exaggerated applied to databases but illustrates the point): you could theoretically argue that the term ‘nigger’ has been used historically to mean ‘person of color’, ergo it would – terminologically speaking – not be incorrect to use it, right? Wrong. In this case, it’s obvious (to everyone, hopefully) that it would be extremely rude and not ok to call a person of color a ‘nigger’ nowadays. Nobody would be confused if people’s reaction to this was to feel insulted because the above explanation does not take connotations into account.

Like you could say that before the advent of the DH, it maybe wasn’t a big deal to throw around the term ‘database’ to mean any digital ‘base of data’, but since the DH is starting to be established as a discipline and not only as a tool like it might have been in the beginning, things have changed. DH people sometimes feel like their competencies are not taken seriously because their part of the job is seen as the ‘handiwork’ whereas the non-DH input data is the actual research. I think that this latent inferiority complex, or maybe rather some sort of struggle for recognition, is the reason non-precise use of DH-related terminology is sometimes taken bitterly.

So ultimately, it’s not about being right or wrong. It’s about being respectful and not hurting other people’s feelings. Also, non-DH people insisting on using the term in a non-DH way while simultaneously wanting to participate in a DH project will cause a clash of terminology. It might be ok for a non-DH person to use the term like this, but DH people are kind of bound to use the term in a strictly technical way or else they might be seen as incompetent of their own field. In this case, I think the non-DH person should give in because even when they will not be judged by their use of DH-specific terminology, a DH person will. You don’t want your imprecise language to reflect negatively on your cooperation partners.

Since the initial publication of the post, I received the feedback that some people with technical backgrounds are quite open to non-technical uses of the term ‘database’. But from my own experience of the DH overall, I feel this is not necessarily representative. And only because people will accept that it is theoretically valid to use the term to one’s own judgement, that doesn’t mean people will condone it in practice.

If you want collaboration, start actually collaborating by learning about DH terminology

Especially if you are trying to get a collaboration DH or label DH project, I suggest you prune your language a little bit here. After all, DH people usually have lots of people queueing to get a project with them. They will tend to take the ones interesting for them (in terms of subject) and/or those where the applicants seem nice. And it is deemed base politness to research your collaboration partners’ field so you don’t draw a complete blank. You want your partners to be understanding and reasonably well-educated on the baseline of your field too, right? And you probably catch yourself sometimes, secretly saying to yourself in indignation or disbelief ‘How can any academic not know that?! This is completely basic!’

Well, it happens to DH people, too. Often concerning so-called ‘databases’ which are not, in fact, databases. If you persistently use the term wrong, it’s seen as lack for trying or plain incompetence. Don’t be rude. Now you are aware of the problem, you have no excuse to continue saying it wrong.

How to know if it’s ok to call it a database?

Two questions to ask to get a feel for whether what you mean might actually be a database:

  1. Would it make sense to represent this data in an (Excel) spreadsheet? Then it is likely someone chose a database format to represent it digitally.
  2. Are there any other fitting means to represent it? Only because it outwardly looks like it stores spreadsheet-formatted data, this doesn’t mean it’s they way data is stored “behind the scenes”.
  3. In case of any doubt whatsoever, just refrain from calling it database. Just say ‘digital resource’, ‘digital corpus’ or basically anything else which seems half appropriate. Anything else is way less stigmatized and cringe-worthy than the misuse of ‘database’.
  4. So to be on the safe side: Just don’t call it a database unless you’re sure it is one (technically speaking). When not 100% sure, just term it a digital resource or digital data collection. I know you just mean a ‘digital base of data’. But please respect that the wide category ‘digital base of data’ doesn’t mean the same thing as the narrow term of ‘database’ in a technical field.

Just a little thought – hope this helps!


the Ninja

PS: Can someone tell me whether you think it is valid to inform people they should not use the term database or would it be ok with you when they use the term in a non-technical way? I only know that people around me react quite aggressively when you do and will think you’re a technical layperson, thus not trust you much once you did. You’d basically be ‘disqualified’ after that unless you really have some very interesting other assets or extremely good grant acquisition records or splendid networking connection value.


2 thoughts on “Don’t call it a database!

  1. FWIW, for me it just depends on the context. If the conversation is technical, about, say, designing schemas and exactly what type fields should be, then sure, naturally one wouldn’t want to casually throw around the term “database”. But in general discussion about, say, Big Data, I might say “database of email” or “huge text database” even though I know that technically they are probably represented in other ways. Just my thought FWIW …


    • Thanks for your input – I think you are correct. My point, however, is to ask whether people who agree that it’s theoretically ok to use the term in a broad sense would also be ok when actually confronted with a Humanities person who is obviously clueless when it comes to critically reflecting possible meanings of the term. I am quite surprised that most feedback up to now is from people who are ok with a broad use of the term because this doesn’t reflect my experiences in the DH field at all… Bit confused 😉


