Should I start doing DH?

My non-DH colleagues and friends ask me more and more often if I think they should start doing Digital Humanities and if yes, where to start? Since this seems to be an interesting topic for many, I thought I’d quickly elaborate on it.

Disclaimer: Even though I’ll  put on my “career advisor” hat right now, I want to remind you that I am in no way qualified to advise you on your career. So if it all goes downwards from now, I am not the one to blame. All opinions are my own and should be treated as such.

So, now we got the legal part over with (essentially: don’t sue me), let’s get to my opinion on the topic. I think it is out of the question whether you should start doing DH. In my prognosis, almost all Humanities research is going to be at least part DH in the near future. If you ask me. And you did.

So, the point is: if everybody is going to do DH anyway, so should you. You don’t want to fall behind. This is good – and bad. If everybody is going to do DH in the future, there is no way around the extra work for you to learn it. But then again, hey, you’re already at the right site for getting awesome DH help – so I’m not too worried for you.

Doing DH is going to be normal soon enough

In fact, I think we’re almost at the point where it already is. So for one thing, if you don’t learn the basics and do at least some DH, you will be sub-standard and below average. If you learn and do  some DH however, it won’t be a door opener either because everybody starts doing DH now, ergo it won’t be special anymore in 5 years.

So yes, you should do DH, already so you don’t fall behind. But also don’t expect it to get you very far. Learning what you can now is merely the entrance barrier. If you want your DH affiliation to count in the years to come, this will only be possible if your approach is super innovative or you’re really good at technical stuff. And technical probably way beyond what is common now (XSLT and stuff). It is my opinion that if you want to make a career in the DH in a few years to come, XSLT and web developemnt might not be enough anymore. Maybe if you get lucky. At least, those now-standard DH basics technologies will be the very foundation everybody is expected to have. Alongside the 500 other skills on top of that.

“Label-DH”

If you are a Normal Humanist now, you might not want to completely change course and become a very tec-savvy Digital Humanist unless you already have the programming foundations. You might just want to add a pinch of DH to spice up your regular Humanities research or be eligible for certain grants. Then you are what some call “label DH”.

First of all, I have to add that I am a bit biased when it comes to so-called “label DH”. “Label DH” are people who label themselves  as “DH” but don’t really do DH or are ‘only’ the Humanist part in a DH project or affiliated to a DH project or else. Essentially they have no legitimate DH skills whatsoever but aggressively label themselves as DH for the advantages of it. If you’re only in for the benefits but not ready to put in the work, obviously everybody is going to hate you and you might or might not get lucky with this approach. I wouldn’t recommend it. I think that not so many people are successful with it now. Never overstate your DH abilities, especially if you have none. People will know and you’ll basically be out of the race. I don’t like label DH. Some great DH thinkers, like Patrick Sahle from what I gathered from a talk of his, believe that label DH is just as important for DH as a discipline as is “hardcore DH”. Because it popularizes the discipline more widely. Maybe it is. I’m not particularly fond of it anyway.

Well, I’m a hardliner. I believe that “real DH” would mean to be just as hardcore at programming as you are at your Humanities research. All while not losing touch with your Humanities research, for then, you would turn to a “mere programmer” (not meant in a pejorative way). Because the whole point of DH is that you’re not either a programmer XOR a Humanities scholar. It’s the combination of both. Most people see that combination as some sort of 30/70 or 40/60 kind of thing. I think it has to be 100/100. And yes, that means you’ll have to be a freak with a 200% workload. I’m pretty alone with this opinion, however, so don’t panic. Most people don’t see it like that at all.  I’m generally a bit of an eccentric and maybe some might perceive my opinion to be extreme. Well, sorry, but I like extreme. I think that “real DH” should mean 200%, or even better: 300%. 150% programmer and 150% Humanities. Be hardcore at both. At least that’s my personal goal.

Half-assed just probably won’t do the trick anymore

Like I said, I’m no expert. But my view of the field is that already now there is a lot of half-assed stuff. A lot of people do DH and not all of it is good. So far, the field has been pretty chill but I’m not so sure it’s going to stay that way. Competition will get harder and harder. In fact, it already is harder than it used to be and the boom is extreme. DH used to be marginalized but now, it has become mainstream. I can’t even imagine the masses of people starting to do DH from all over the Humanities. And then, there is formal  education in the DH now which booms, so we soon will be “flooded” with certified Digital Humanists. I put “flooded” in quotes, because of course, there is more work than ever. Seeing as everyone everywhere is going to do at least some DH from now on, the demand is high too. But still, as a non-DH-certified Humanities scholar you will probably have  a harder time benefitting from the DH without going all-in in the near future. 

I can’t really judge if this will cease, as it was feared a few years back when people thought the DH were yet another hype, to pass as quickly as it had come. They were wrong about that. The DH have come to stay. And they are the cool kids in school now. The Geeks get the girls or whatever. (In case anybody noticed, this is an American Hi-Fi reference but you probably have to Google to find out what that is).

People around me think the demand is not going to sink in the next few years and probably not in the next decades either. Digitization is everywhere and it gets ever more extensive. So no, the demand is probably not going to cease. But new generations of scholars might soon start to learn the DH basics you lack as part of their normal curriculum. So yes, I very much believe you might be at risk to get left behind. Not unless you’re revolutionarily good at your Humanities stuff. Like “excellent” or whatever they call it. So, as a guideline, you probably will need to learn DH. Applying for grants will also require you to have at least a basic overview of what’s going on in the DH. You don’t want to be left behind. For the normal scholar, going your way around the DH basics will be a prerequisite for “excellence”, not the easy way to an excellence award.

What I think you really should learn as fast as you can

Annotation in XML and at least one XML-standard relevant to your research

Learn annotation in XML now because it is easy. Like I said before, this won’t get you very far anymore but it is the foundation on which you can build and will be a gatekeeper. If you don’t even have this basic building block, no more doors will be open to you, even in label-DH projects. I see this starting to become reality now already for everyone who is not an important Humanities professor or otherwise super-important. Also, if you ask for cooperation and possess a basic knowledge of these basics, DH people will be a lot more willing to talk to you because it shows that you did your homework. DH centres can’t accept all projects. This is a way you can stand out from competitors.

How can I start?

Formal education

  • Get a certificate (from a summer school up to a year’s worth of classes).
  • Do a DH master

Teach yourself

Well, of course there is your favourite go-to resource for everything DH (and LaTeX): The LaTeX Ninja – yaaaay! 😉 With many more tutorials to come (soon, hopefully).

Pause to think whether you’re already doing DH

You would have noticed, you think? Well, DH is not only XML and annotation. There are many aspects to it and maybe you have already done something digitally that doesn’t strike you as DH or doesn’t come to mind rightaway.

Learning DH will only really work for you, if it fits your research. So, find a way of going digital which is compatible with what you already do (like a “digital update” of your current work) rather than trying to force yourself to do DH in ways which don’t immediately make sense to you. Take some time to brainstorm this, however. The good ideas might not come to mind  straightaway. Google digital projects from your field. What are they doing? Who does the digital serve their research purposes? What can you take away from that for your own research? If it doesn’t fit between DH and you, people will know. You have to find something you like. If you hate what you do, you’ll never get good. If you like what you do, learning something new will be fun.

Learn something new

I have an extreme drive to always learn and do new things. People usually comment they can’t really understand that. They don’t get me. I think it’s all a question of perspective. If you feel like you have to learn something new, it will be “hard work”. If you want to, it can be an adventure and a nice challenge. Rise to the challenge.

The power plant doesn’t have energy; it transforms one form to another. It generates energy and transmits it. We are the same. (Brendon Burchard)

Life-long learning sounds like a burden to many, but somewhere deep down, past the coziness of our comfort zone, we do have a natural child-like curiosity for learning new things. Try to reacitvate that if you’ve lost it. Use the DH as your trial project.

Cheers,

the LaTeX Ninja

Advertisements

XML to LaTeX (simple)

Today, I wanted to share this super simple XML to LaTeX tutorial. Using XSLT, you are going to transform XML data to LaTeX output which you can then go on to compile into your desired output PDF. There will be no fancy stuff whatsoever in this post, just the basics and what to keep in mind with these transformations. It is the quick intro to XML to LaTeX I did with my students a while ago which was done one day after they had their first contact with XSLT, so it should really be beginner-friendly. I labeled it “Advanced LaTeX” anyway because I think starting to automate things is always a step in the right direction 😉

Configuring the transformation scenario in Oxygen

I am going to assume you use Oxygen now because that’s what a lot of people in the DH do and this post is directed towards my friends in the DH. Especially those who think print editions are an obsolete concept in times of the Digital Edition. Maybe having a nice little intro to XML to LaTeX transformations available will change their minds 😉

To set up the transformation scenario, choose XML transformation using XSLT, then choose your XML document and your XSL stylesheet (set up and open those document in the editor before you configure the scenario). Then choose a Saxon 9 version (whichever you like). Then ignore the FO tab and get right to the output tab.

Here you configure how to name your output. Best click the green arrow and choose cfn, then append -latex.tex. So the result the current filename with -latex.tex appended to it. This is an important step so you don’t accidentally overwrite the original. Which in this case is not so dramatic since it has a different ending anyway but if you do XML to XML transformations, this is even more crucial. Then tell Oxygen to show it in the editor as XML (even though you know it isn’t). The editor will then, of course, complain about the non-valid XML but don’t worry. Just copy all (CTRL+ACTRL+C) and paste it (CTRL+V) into a completely empty (!) project in Overleaf  or just compile the LaTeX directly if you have it installed on your machine.

There it is, you’re set. Now let’s get to the stylesheet.

The stylesheet

So, this is the whole thing. You can just grab it and go if you don’t care for the explanation or read on to find out why things were done the way they were done. This tutorial assumes you’re already familiar with how XSLT works, just haven’t done transformations to LaTeX yet, by the way. I am also assuming, your base XML is in the TEI standard.

 

 

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"     xmlns:xs="http://www.w3.org/2001/XMLSchema"     xmlns:t="http://www.tei-c.org/ns/1.0"     exclude-result-prefixes="xs"     version="2.0">
    <xsl:strip-space elements="*"/>
    <xsl:output method="text" encoding="UTF-8" indent="no" omit-xml-declaration="yes"/>

    <xsl:template match="/">
        <xsl:text>\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\DeclareUnicodeCharacter{2060}{\nolinebreak} % might not be necessary for you

\title{</xsl:text><xsl:apply-templates select="//t:title"/>
        <xsl:text>}
\author{</xsl:text><xsl:apply-templates select="//t:author"/>
        <xsl:text>}\date{\today}
\begin{document}
\maketitle
\tableofcontents\newpage</xsl:text>
        <!-- get some metadata from the TEI header using the push paradigm -->
<xsl:text>\begin{itemize}</xsl:text>
<xsl:for-each select="//t:persName[ancestor::t:teiHeader]">
    <xsl:text>\item </xsl:text>
    <xsl:value-of select="." />
    <xsl:text>

    </xsl:text>
</xsl:for-each>
<xsl:text>\end{itemize}
\newpage</xsl:text>

        <!--  <xsl:apply-templates/> OR -->
        <xsl:apply-templates select="//t:text"/>
        <!-- just use the pull paradigm on the TEI body so you don't get a meaningless TEI header dump in your document -->

        <xsl:text>\end{document}</xsl:text>
    </xsl:template>

    <xsl:template match="t:head">
        <xsl:text>\section{</xsl:text><xsl:apply-templates/><xsl:text>} </xsl:text>
    </xsl:template>

    <xsl:template match="t:p">
        <xsl:apply-templates/>
        <xsl:text>

        </xsl:text>
    </xsl:template>

    <xsl:template match="t:hi">
        <xsl:text>\emph{</xsl:text><xsl:apply-templates/><xsl:text>} </xsl:text>
    </xsl:template>

    <xsl:template match="text()">
        <xsl:analyze-string select="." regex="([&amp;])|([_])|([$])">
            <xsl:matching-substring>
                <xsl:choose>
                    <xsl:when test="regex-group(1)">
                        <xsl:text>\&amp;</xsl:text>
                    </xsl:when>
                    <xsl:when test="regex-group(2)">
                        <xsl:text>\_</xsl:text>
                    </xsl:when>
                    <xsl:when test="regex-group(3)">
                        <xsl:text>\$</xsl:text>
                    </xsl:when>
                    <xsl:otherwise/>
                </xsl:choose>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <xsl:value-of select="." />
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>

</xsl:stylesheet>

 

The XSLT declaration and the LaTeX preamble

Put the following after your XML declaration. This will ensure output is LaTeX-friendly.

 

<xsl:strip-space elements="*"/> <!-- for LaTeX -->
<xsl:output method="text" encoding="UTF-8" indent="no" omit-xml-declaration="yes"/>

 

\DeclareUnicodeCharacter{2060}{\nolinebreak} was added because LaTeX complained about an undefined character in some of my students XML data. Personally, I had never gotten this error before, so you might as well leave it out.

Creating environments

As we had just learnt some XSLT basics, I wanted the students to use at least one push and one pull paradigm type template. So the task was to process any element from the TEI header using the push paradigm and then a pull paradigm template for the body. To make this little template more efficient teaching-wise, I decided to introduce how to create a LaTeX environment using XSLT for the TEI Header / push paradigm and do at least one other command using the pull paradigm on the TEI body. Also this demonstrates as opposed to for the body.

So this next piece of code sets up an itemize environment for persons present in the header. In a “real” stylesheet, it would probably be more wise to check, using whether there is one potential element like that present and only paste the \begin and \end on that condition.

Also, as you might have noticed, all the LaTeX commands are inside . This looks a bit confusing at first but really isn’t. Just make sure you don’t create invalid XSLT by shuffling them around.

 

<xsl:text>\begin{itemize}</xsl:text>
<xsl:for-each select="//persName[ancestor::t:teiHeader]">
    <xsl:text>\item </xsl:text>
    <xsl:value-of select="." />
    <xsl:text>

    </xsl:text>
</xsl:for-each>
<xsl:text>\end{itemize}</xsl:text>

 

Pull paradigming emphasis

When creating simple commands using the pull paradigm, be sure that you don’t end up with too many “overlaps” since “simple” commands in LaTeX don’t take multiple paragraphs as arguments. If in doubt, always use environment and the global switches (like \bfseries). Since you are automating things, you always have to take into account that data might not always be marked-up in a way which makes sense to you. There can easily be linebreaks inside a single italic highlight. If this is the case in you data, better create an environment. For simple purposes, however, this is good enough:

 


<xsl:template match="t:hi"><xsl:text>
\emph{</xsl:text><xsl:apply-templates/><xsl:text>} <xsl:text></xsl:template>

 

With these commands (and genereally when transforming to LaTeX), you sometimes need to make sure you don’t involuntarily add spaces or lack space which will make the output hard to read and debug.

In this example, I made sure not to add any whitespace inside the template rule and also did not have Oxygen format or indent the XML to avoid these unwanted spaces.

And finally: Escaping entities

As you might remember, markup languages tend to use entities to escape certain characters. Bad thing is, LaTeX and XML use different entities. So we need to escape them. I know that the OxGarage standard stylesheet does this using the translate() function but I prefer to use since it’s less “messy” than a nested translate() construct.

Ah, and side info by the way: There is this standard stylesheet from the TEI consortium which might be of help if you are looking for inspiration. Since it is very generic, however, it might not be helpful if you are a newbie at both XSLT and LaTeX. The XSLT is pretty advanced and also the LaTeX probably uses some commands you might not be aware of.

 

    <xsl:template match="text()">
        <xsl:analyze-string select="." regex="([&amp;])|([_])|([$])">
            <xsl:matching-substring>
                <xsl:choose>
                    <xsl:when test="regex-group(1)">
                        <xsl:text>\&amp;</xsl:text>
                    </xsl:when>
                    <xsl:when test="regex-group(2)">
                        <xsl:text>\_</xsl:text>
                    </xsl:when>
                    <xsl:when test="regex-group(3)">
                        <xsl:text>\$</xsl:text>
                    </xsl:when>
                    <xsl:otherwise/>
                </xsl:choose>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <xsl:value-of select="." />
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>

 

And that’s it. I hope this was useful to you.

Cheers,

the Ninja

Buy me coffee!

If my content has helped you, donate 3€ to buy me coffee. Thanks a lot, I appreciate it!

€3.00