Modelling is central to the Digital Humanities. Even so much that some claim it is what unites the DH as a field or discipline! But what is modelling? What do we mean by it anyway? This post will hopefully provide you with the primer you need.
Sorry for the very sporadic blogging lately. I still haven’t figured out how to include blogging into my PostDoc life. I think I want to get to a rhythm of around 1-2 posts per month. More than that is absolutely not realistic but, as you may have realized, I didn’t even manage that consistently over the last year. Then again, it’s not like I’m not producing teaching materials anymore. Most of my efforts this year have gone into all the classes I have been teaching (I’m hoping to share slides and teaching materials for all of them once they are cleaned up) – I have taught an intro to text mining, my usual information modelling class which is the inspiration for this post, a co-taught seminar on research trends in DH, a project seminar supervising students close to finishing their degrees and finally a class on digital editing which inspired What you really need to know about Digital Scholarly Editing. Early in 2022, I created a few new teaching videos (content like A shamelessly short intro to XML for DH beginners (includes TEI) but the video is in German), taught the First ever LaTeX Ninja workshop at Harvard: “Beyond TEI: Digital Editions with XPath and XSLT for the Web and in LaTeX” (for which the teaching materials were published so you can follow along) and co-organized a winter school on Digitizing the Materiality of the Pre-Modern Book in November from which teaching videos are being produced (I’ll obviously share those as soon as they are available!). So there hasn’t been lots of activity on this blog but I have still been producing teaching materials for you folks, the results of which will hopefully soon become publicly available for you to peruse. The next thing is a school with accompanying slides, jupyter notebooks and teaching videos on Computer Vision in Digital Humanities. So stay tuned!
What are data?
To understand modelling, we need to understand what data are. ‘Data’ is the plural of the Latin datum (meaning ’something which is given’). However, this term is actually kind of problematic because actually, data aren’t exactly a given but rather constructed or created. There has even been a discussion in the DH whether we should call them capta, meaning ‘something which is captured’ (Drucker 2011). Lots of data (so-called ‘givens’) of our modern world are constructed by phenomenotechnical devices (Bachelard 1968), i.e. pereiving devices which translate the (often quite abtract) things they see into data: an example would be a sensor which translates temperature it ‘perceives’ by some measure into a quantitative value, i.e. a number which only makes sense according to a certain convention (such as interpreting degrees in Celsius or Fahrenheit). Translating heat into such a number is, if one thinks about it, actually quite absurd: we’re turning something which is not numeric in nature into a number. Furthermore, this number doesn’t stand alone: It is meaningless unless interpreted correctly in the right context. I hope you can already see where I’m getting at: data are by no means the objective values non-scholars often make them out to be. I could get into a rant on qualitative versus quantitative research but that probably is material for a blogpost of its own. Not that I’m against quantification per se, I’m just cautioning against a naive take on it. But what I’m really trying to get at is the following: the value we put on whatever measure came out of the sensor is a very simple model for the hard-to-get-at real-word phenomenon of temperature. There are a number of important things we have already observed on this minimal example of a model: Data has to be interpreted. Models are abstractions of real-world phenomena. Models serve a purpose and only make sense when used for that purpose. Because models are abstractions, selective snippets of real-world phenomena and also, quite subjective and dependent on the technology we’re capturing them with (which usually brings a set of conventions of its own), our resulting data is subjective and incomplete. Data are not at all the same thing as the original. I’m saying it again because it’s important: Data != original. The same is true for our models which are, essentially just more complex forms of data (data being mini-models of their own and models usually consisting of a collection of data).
Phew, I hope this wasn’t too much of a shock and you can still wrap your head around it all. If not, take a break and read it again because over the next sections, we will take a plunge into the cold waters of modelling theory 😉
Why should we care about modelling theory?
Ok, so you have read the first section and heard me say how it gets even harder. Maybe you found it kind of abstract and are wondering why you should care about this anyway.
Modelling is a pivotal task in the Digital Humanities. Whenever we create digital representations of material objects, those are models. Modelling as an activity is seen as so central to Digital Humanities activities that some even use it as a criterion for defining the Digital Humanities as a discipline. I had initially planned to get into this a little deeper but then realized that the blog post had, as always, already gotten quite long and frankly, the topic is hard enough as it is. Let’s leave the scary “But what is DH anyway?” discussions for some other time.
Still, one interesting point: As much as German, European and Anglo-American DH discourses can differ, just as much do their stances on modelling. The general gist is the same, of course, but it turns out that English literature often – allow me to generaliize a bit here – doesn’t make use of modelling theories which already existed before DH in other languages, such as the German modelling theory by Herbert Stachowiak which we rely upon a lot. If you know German (or are ok with auto-translated subtitles), I highly recommend the video essay by Tessa Gengnagel, A tale of two cultures. Another great, visually pleasing, critical yet entertaining video essay by Tessa Gengnagel (in English), Digital Humanities, or: The Broken Record of Everything, reflects upon modelling and digitization in Digital Humanities. This will give you a better idea and bird’s eye view of the topic.
A primer in modelling theory
The main source I use to introduce modelling theory in my classes in Herbert Stachowiak’s theory from before the advent of presonal computers. I’m using some Stachowiak translations from here and some are my own. Stachowiak says that all knowledge-making is knowledge-making in or through models and all human perception of the world needs models as a medium.
According to Stachowiak, a model has three central properties:
- Mapping property: Models are always models of something, i.e. mappings from, representations of natural or artificial originals, that can be models themselves. = „Modelle sind stets Modelle von etwas, nämlich Abbildungen, Repräsentationen natürlicher oder künstlicher Originale, die selbst wieder Modelle sein können. […] Der Abbildungsbegriff fällt mit dem Begriff der Zuordnung von Modell-Attributen zu Original-Attributen zusammen.“ (Stachowiak 1973, 131–132)
- Reduction property: Models in general capture not all attributes of the original represented by them, but rather only those seeming relevant to their model creators and/ or model users. = „Modelle erfassen im allgemeinen nicht alle Attribute des durch sie repräsentierten Originals, sondern nur solche, die den jeweiligen Modellerschaffern und/oder Modelbenutzern relevant scheinen.“ (Stachowiak 1973, 132)
- Pragmatism property: Models are not uniquely assigned to their originals per se. They fulfill their replacement function:
a) for particular – cognitive and/ or acting, model using subjects,
b) within particular time intervals and
c) restricted to particular mental or actual operations.
= „Eine pragmatisch vollständige Bestimmung des Modellbegriffs hat nicht nur die Frage zu berücksichtigen, wovon etwas Modell ist [Abbildungsmerkmal], sondern auch, für wen, wann und wozu bezüglich seiner je spezifischen Funktionen es Modell ist.“ (Stachowiak 1973, 132)
A model is thus a snippet of the real world but it only covers the attributes I chose to be relevant for the task at hand. Thus, the model and the aspect of the real world it models (its subject) diverge. Not all models are digital models. In fact, Stachowiak came up with his modelling theory in a time before digital models were such a big deal. However, for a model to be represented digitally, it also needs to be a formal model. Standardized models allow us to exchange and analyse data, search or query data. Only formal models can be processed digitally, i.e. every digital model is a formal model. To sum it up, we could say that models are simplified representations of parts of the real world.
Stachowiak’s pragmatism criterion means we selectively capture data most important to us, not all possible aspects (!). Data resulting from cataloging & digitization contains interpretations and is thus, per definitionem, always subjective and incomplete. Data are never the same thing as the original. In the same way, models are never equivalent to their originals. They merely represent them in the digital sphere in the way that a book scan represents a real book: It allows you to read what’s on the page, maybe even run some OCR to get at the digital text or do some fancy computer vision stuff. But you can’t touch it, you can’t smell it. It’s not the same thing as the real book and you have no way of verifying whether everything is correct on that scan. There has been this scandal, for example, when it was discovered that Xerox scanners (very common!) actually had algorithms implemented which, trying to save memory space by compressing the data, actually corrputed the scans. The resulting scanned pages were, in fact, not eact copies of the original. Much like such a digital model will never be truly the same in many way. And many of those ways you wouldn’t even think about until it becomes a problem some day. Anyway, digital images can be very useful and if I didn’t believe in the benefits of digitization I wouldn’t work in the DH but you still need to stay cautious and critical. A digital representation is never the same as the original. The digital surrogate can be used, sometimes even quite well, for some purposes while it is impossible to use it for others. In many cases, for example, historians of the book need to go back to the library and consult the original. Because some parts of the experience of using a book simply cannot be replaced by a digital surrogate. Or in the very least not by a scan coming from mass digitization.
Actually, sometimes even the quality of the scan makes a big difference in terms of its representation value with regards to the original. What this means is that models are by definition subjective, abstracted and not universal. Their quality has to be judged in relation to their purpose (Stachowiak’s 3rd criterion). For many purposes, a kind of shitty scan from a mass digitization initiative will suffice. In other cases it won’t. In the cases where it does, hoever, it wouldn’t be efficient to have made higher quality scans because those are a real-problem for long-term archiving as the files are very big. DH is complicated 😉
That’s why there are different types of modelling-related activities in digitization. We distinguish research-driven digitization, that is digitization for specific needs. It will be individualized for answering a research question, work-intensive and relatively
expensive. The other type is curation-driven mass-digitization which uses more of a cookie cutter approach which covers the most important elements for most use cases but can easily miss features relevant to subject-matter experts. Despite the objects being digitized, users might have to go back to the material objects to fill in those blanks (they might have to do that anyway though). Why do we do lower quality mass digitization then anyway, you may ask? It makes objects more discoverable and, due to the cookie cutter approach (also referred to under the name of ‘data standards’) somewhat comparable to larger corpora of similar objects. However, superficial digitization can lead to the creation of misleading datasets, e.g. errors or bad tagging. In those cases, reparative librarianship (or whaterver the concerned discipline) is needed.
How I teach my introduction to information modelling
In a class on information modelling, the “entry class” into our DH programme which I have been teaching for a few years now, we would, on the one hand, learn about such theory basics as have been discussed in this blog post but also learn some practical skills in data modelling. In that concrete class, we would learn to model a Humanities object/subject, first as an Entity Relationship Diagram and then in the form of a relational database in SQL. Once that is done and students are working on their own projects, I would go into the topic of “messy data” in the DH and the question of whether Humanities data can even be represented well by formal models (hint: not all data types work equally well for each respective type of Humanities information). We then look at data cleaning and pre-processing using OpenRefine (just a sneak peak) to give a first idea of what data science could look like in DH. We try out reconciliation, i.e. enriching your data with norm data (and learning about what that even is) and finally, I give an overview of other data types relevant for the DH because – let’s face it, SQL databases are great to learn how to model information in a basic way for beginners but it’s not the most crucial technology for DH. This end of term overview helps students understand what types of knowledge and skills they will learn in which upcoming classes of our master’s programme. I’m teaching the class in English this term and thus, hopefully have slides to share with you by the end of February.
If you’re a Digital Humanist, I’d love to hear your thoughts on the topic: Did I get the post right? Do you disagree? Did I miss anything major? How do you teach an intro to modelling and how would you explain data or modelling?
That’s it for today and thanks for all the fish!
(Still alive and kicking)
Buy me coffee!
If my content has helped you, donate 3€ to buy me coffee. Thanks a lot, I appreciate it!
Resources and references
- Gaston Bachelard. Le nouvel esprit scientifique. 10th ed. (Original work published 1934). Paris: Les Presses universitaires de France, 1968.
- Johanna Drucker. “Humanities Approaches to Graphical Display”. In: Digital Humanities Quarterly 5/1 (2011).
- Herbert Stachowiak. Allgemeine Modelltheorie.Wien, 1973.
- Modelpractice blog: Stachowiak translations