Formulating Research Questions For Using DH Methods

In the feedback forms I did on the DH classes I have taught over the last years, I got one feedback I didn’t expect: People were extremely grateful I had practiced with them how to formulate valid research questions which, apparently, no one had ever (really) done with them before. I found that quite astonishing because the DH are all about methods and methods are like specizalized tools. You need to know what you can use them for. So here’s the crashcourse.

The Hammer and the Nail

I want to start off with an analogy. A hammer is a specialized but not an extremely specialized tool. You can use it for a range of tasks, however, not all tasks are going to work equally well. Some might work but would actually require a more specialized tool if you had one. You can really use the hammer on about anything and almost always, something is going to happen.

For example, you can use a hammer on an eggshell and something is going to happen. You can also use the hammer on steel and something is going to happen, although not much to be honest. And then you can use your hammer on a nail. That’s the sweet spot, that’s what it really works well for, what it was made for, that’s what it does.

A method is like a specizalized tool

So a method is like a specialized tool in a way that is it not equally apt for all sorts of uses. Not every method will work just as well on a given problem or research question. That’s why it’s so important that we first become aware of possible use cases for a method before we start trying to apply it to research questions.

I will mostly work with the example of quantitative text analysis here because that’s the class where I talked about this at length. But it would work the same with annotation or any other topic. I just thought text mining is an especially useful example because it can seem like some sort of magic to ‘outsiders’ or people who just don’t understand the method all that well. That’s also why they think “I can just magically apply it to any research question and get a valid result, right?” Wrong. 

Identifying ‘hooks’ for digital methods

Mostly when coming up with a valid research question for digital methods, you need a Humanities problem, a description of the data available and then a ‘hook’ in this data which a digital method can grab. That usually means you need to identify a feature (that how we usually refer to the hook) which will give you information (that you didn’t have before) and with which you’re able to answer your research question. It’s not good if the data you get after the analysis is as about as cryptic to you as the topic was to begin with. So essentially, what we want to do is come up with something really simple which, when observed in lots of detail and lots of quantity, will inform you about the possible answer to your research question.

And of course, you need to check and discuss whether this feature selection was a good choice to begin with when explaining this choice in the intro to your work as well as in the concluding section where you discuss those results. Because the choice of this ‘hook’ will largely determine your results. If it was a bad hook, at best you get no understandable/helpful result at all, at worst, you get misleading results. So maybe it will be easier to come up with an observable feature first and then wonder what questions this could help answer. When you’re more experienced, you’ll have a more intuitive grip of possible features, so this step is less important. Or maybe not less important but you don’t need to take the step this explicitly.

The importance of careful feature selection

The better we choose our research question, the easier the results will be to interpret. Meaning that it’s better to spend lots of time coming up with a good research question rather than jumping right in an ending up with a so-called result which really isn’t a result at all. Coming up with a good research question and methodology is a crucial part of research even when it doesn’t translate to many pages in text. It will save you lots of editing once the text is done, however, because you won’t end up having to gloss over all that inconsistency in your paper.

I think that especially in the DH, formulating the question is probably the most important part of good scholarship. In the Humanities, you can often get away with a sloppy research question and still write a readable and possibly even quite interesting paper. In the DH, the results are going to be grotesque at best. The more ‘computational’ or programming-oriented the questions, all the more so. Or you’re producing a result to a question that no one even asked in the first place. These are the fancy visualizations which are currently quite popular in the DH even though you often can’t see their contribution to any scholarly discussion whatsoever, apart from showing off that one had enough digital skills to pull it off. In early modern alchemy, such people would be called ‘puffers’, i.e. alchemists who put on one hell of a show but without ever really producing anything. Don’t be a puffer. They were considered frauds.

The 5 questions exercise

An easy exercise I did with my students was that I asked them to formulate 5 random research questions. They didn’t need to necessarily make sense to pursue or appeal to the students at all. These questions had two be formulated twice each: Once in a way that can be answered using the chosen method and one which cannot be answered using the chosen method (in this case by using some sort of quantitative text analysis). This exercise is difficult for students but with lots of hints and help, as well as examples they can use as models for their solutions, they can come up with something. Once they try to come up with the real research questions they will answer in the seminar, however, you’ll have to go through the whole process again because they likely didn’t fully understand it yet. In my experience, students will develop this ‘judgement’ by putting one of their questions into practice. Then they usually realize when a question didn’t make sense or they just can’t answer it with the process they had planned. Or you have to be strict with them and remind them that they didn’t actually produce any tangible ‘proof’ of their claims or didn’t really answer the question they aksed.

I want to share a few of the questions from a particularly good homework I received. I translated them into English, modified them slightly and made them more detailed or into multiple questions to be more specific:

  1. Possible to answer with DH methods: Which words appear in the novel which are classified as positive or negative by a sentiment dictionary? How is the proportion between negative and positive? Does this proportion vary significantly when we use different sentiment dictionaries? Does this evaluation comply with our feeling from the close reading (not to be answered objectively but can be explored by identifying errors in classification, such as ‘miss’ being classified as negative in Jane Austen). Not possible: Are Jane Austen novels pessimistic?
  2. Possible to answer with DH methods: When do words classified as positive or negative appear in the novel?  Not possible: Which effect does this have on the reading experience?
  3. Possible to answer with DH methods: Which words are frequent in the novel? Which words are frequent after elimination of stopwords and some other operations? Not possible to answer: What did the author want to convey by using these specific words? Why did they use them and how did they come up with this vocabulary?
  4. Possible to answer with DH methods: Which personal names appear most frequently in the novel? Maybe also relatively frequent per chapter, not only frequent because of certain peaks. Not possible to answer: Which people are most important in the novel?
  5. Possible to answer with DH methods (possibly using annotation rather than QTA): Which individuals have most direct speeches in the novel? Not possible to answer: Why do some people have lots of direct speeches? Are people more important or central only because they have more direct speech?
  6. Possible to answer with DH methods: In which books do the author signals resemble each other significantly so that there is a high probability they could be the same author? Not possible to answer: Did one particular author write a given book? 

What we can learn from the exercise

As you can see, the differences between ‘possible to answer’ and ‘not possible’ are quite minute in many cases. They might seem like it’s only me being pedantic about the choice of words. But no, these differences are quite consequential, especially to one who is not aware of them. They can make the difference between a successful project and spurious one. These are mistakes you are bound to make when you don’t pay close attention, especially if you feel closer to the Humanities than the Digital or Computational part! It’s important that you not only write out the ‘possible’ part but also very explicitly the ‘not possible’ part even though sometimes, the ‘not possible’ might seem obvious. This is not always the case (see the subtle differences above!). Only by formulating both sides can you make it clear to yourself where those minor differences might be in your project.

When formulating these questions, be sure to use the vocabulary of the method and not the terminology of the field of origin. If your subject is literary studies, don’t use concepts from literary studies. Like in example 1, write out the answer very closely to what the technical method actually does (i.e. “check if a word is labelled positive or negative in a sentiment dictionary and whether you get a varied result using another dictionary”). This sounds very unromantic to the field of origin but that way, you can’t get a false sense of what’s possible or confuse ‘what the program really does’ with ‘what you would like the program to do’This is probably the most common mistake with formulating research questions for quantitative methods. Don’t start from what you want to do. Start from what’s possible with a given method and the data you have.

Coming up with a research question

Now that you know how to judge whether a method is useful for a given research question, you might want to know how to come up with a research question in the first place.

Well, firstly, it will work best if you understand the optimal use case for the method before you come up with a research question. Mostly, people will come up with questions which are not fit for their chosen method at all if they don’t start thinking about it already coming from/with a good understanding of the method.

But, of course, you also need to have enough Humanities skills and background knowledge to be aware of the ‘adjacent possible’ (a term used by Cal Newport in So Good They Can’t Ignore You). That means that you know what’s going on, which trends currently exist and how you could recombine those into something cool and new.

If you’re just looking for a topic for the next seminar paper, this might be somewhat overkill. However, this is the reason many DH people don’t think it makes sense to teach (full-time) DH on a Bachelor level. People need to get the skills and knowledge from their ‘domain of origin’ first before they can apply digital methods to them. It’s extremely difficult teaching DH skills early in the Bachelor, or trying to teach DH skills to people who aren’t even Humanities scholars in the strict sense (such as information science or the like, speaking from my own teaching experience) because they just don’t have ‘the content’ to apply the metohds to. They have lots of their own theory, but not so much the material to work with. Furthermore, even when you do have the Humanities skills, the transfer process between digital skills and application to your Humanities problems is non-trivial, yet hardly ever addressed explicity in teaching. I have a post lined up about this in the queue already (Edit: Obviously, it’s out in the meantime, see Looking at data with the eyes of a Humanist: How to apply digital skills to your Humanities research questions).

If you’re interested in how to come up with a research question for any seminar paper (even if you’re really not all that interested in the topic of the class), please let me know, I’m happy to write something up.

Hope this helped,


the Ninja

Buy me coffee!

If my content has helped you, donate 3€ to buy me coffee. Thanks a lot, I appreciate it!


I like LaTeX, the Humanities and the Digital Humanities. Here I post tutorials and other adventures.

7 thoughts on “Formulating Research Questions For Using DH Methods

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.