Simple XML to LaTeX Transformation Tutorial

Today, I wanted to share this super simple XML to LaTeX tutorial. Using XSLT, you are going to transform XML data to LaTeX output which you can then go on to compile into your desired output PDF. There will be no fancy stuff whatsoever in this post, just the basics and what to keep in mind with these transformations. It is the quick intro to XML to LaTeX I did with my students a while ago which was done one day after they had their first contact with XSLT, so it should really be beginner-friendly. I labeled it “Advanced LaTeX” anyway because I think starting to automate things is always a step in the right direction 😉

Configuring the transformation scenario in Oxygen

I am going to assume you use Oxygen now because that’s what a lot of people in the DH do and this post is directed towards my friends in the DH. Especially those who think print editions are an obsolete concept in times of the Digital Edition. Maybe having a nice little intro to XML to LaTeX transformations available will change their minds 😉

To set up the transformation scenario, choose XML transformation using XSLT, then choose your XML document and your XSL stylesheet (set up and open those document in the editor before you configure the scenario). Then choose a Saxon 9 version (whichever you like). Then ignore the FO tab and get right to the output tab.

Here you configure how to name your output. Best click the green arrow and choose cfn, then append -latex.tex. So the result the current filename with -latex.tex appended to it. This is an important step so you don’t accidentally overwrite the original. Which in this case is not so dramatic since it has a different ending anyway but if you do XML to XML transformations, this is even more crucial. Then tell Oxygen to show it in the editor as XML (even though you know it isn’t). The editor will then, of course, complain about the non-valid XML but don’t worry. Just copy all (CTRL+ACTRL+C) and paste it (CTRL+V) into a completely empty (!) project in Overleaf  or just compile the LaTeX directly if you have it installed on your machine.

There it is, you’re set. Now let’s get to the stylesheet.

The stylesheet

So, this is the whole thing. You can just grab it and go if you don’t care for the explanation or read on to find out why things were done the way they were done. This tutorial assumes you’re already familiar with how XSLT works, just haven’t done transformations to LaTeX yet, by the way. I am also assuming, your base XML is in the TEI standard.

 

 

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"     xmlns:xs="http://www.w3.org/2001/XMLSchema"     xmlns:t="http://www.tei-c.org/ns/1.0"     exclude-result-prefixes="xs"     version="2.0">
    <xsl:strip-space elements="*"/>
    <xsl:output method="text" encoding="UTF-8" indent="no" omit-xml-declaration="yes"/>

    <xsl:template match="/">
        <xsl:text>\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\DeclareUnicodeCharacter{2060}{\nolinebreak} % might not be necessary for you

\title{</xsl:text><xsl:apply-templates select="//t:title"/>
        <xsl:text>}
\author{</xsl:text><xsl:apply-templates select="//t:author"/>
        <xsl:text>}\date{\today}
\begin{document}
\maketitle
\tableofcontents\newpage</xsl:text>
        <!-- get some metadata from the TEI header using the push paradigm -->
<xsl:text>\begin{itemize}</xsl:text>
<xsl:for-each select="//t:persName[ancestor::t:teiHeader]">
    <xsl:text>\item </xsl:text>
    <xsl:value-of select="." />
    <xsl:text>

    </xsl:text>
</xsl:for-each>
<xsl:text>\end{itemize}
\newpage</xsl:text>

        <!--  <xsl:apply-templates/> OR -->
        <xsl:apply-templates select="//t:text"/>
        <!-- just use the pull paradigm on the TEI body so you don't get a meaningless TEI header dump in your document -->

        <xsl:text>\end{document}</xsl:text>
    </xsl:template>

    <xsl:template match="t:head">
        <xsl:text>\section{</xsl:text><xsl:apply-templates/><xsl:text>} </xsl:text>
    </xsl:template>

    <xsl:template match="t:p">
        <xsl:apply-templates/>
        <xsl:text>

        </xsl:text>
    </xsl:template>

    <xsl:template match="t:hi">
        <xsl:text>\emph{</xsl:text><xsl:apply-templates/><xsl:text>} </xsl:text>
    </xsl:template>

    <xsl:template match="text()">
        <xsl:analyze-string select="." regex="([&amp;])|([_])|([$])">
            <xsl:matching-substring>
                <xsl:choose>
                    <xsl:when test="regex-group(1)">
                        <xsl:text>\&amp;</xsl:text>
                    </xsl:when>
                    <xsl:when test="regex-group(2)">
                        <xsl:text>\_</xsl:text>
                    </xsl:when>
                    <xsl:when test="regex-group(3)">
                        <xsl:text>\$</xsl:text>
                    </xsl:when>
                    <xsl:otherwise/>
                </xsl:choose>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <xsl:value-of select="." />
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>

</xsl:stylesheet>

 

The XSLT declaration and the LaTeX preamble

Put the following after your XML declaration. This will ensure output is LaTeX-friendly.

 

<xsl:strip-space elements="*"/> <!-- for LaTeX -->
<xsl:output method="text" encoding="UTF-8" indent="no" omit-xml-declaration="yes"/>

 

\DeclareUnicodeCharacter{2060}{\nolinebreak} was added because LaTeX complained about an undefined character in some of my students XML data. Personally, I had never gotten this error before, so you might as well leave it out.

Creating environments

As we had just learnt some XSLT basics, I wanted the students to use at least one push and one pull paradigm type template. So the task was to process any element from the TEI header using the push paradigm and then a pull paradigm template for the body. To make this little template more efficient teaching-wise, I decided to introduce how to create a LaTeX environment using XSLT for the TEI Header / push paradigm and do at least one other command using the pull paradigm on the TEI body. Also this demonstrates as opposed to for the body.

So this next piece of code sets up an itemize environment for persons present in the header. In a “real” stylesheet, it would probably be more wise to check, using whether there is one potential element like that present and only paste the \begin and \end on that condition.

Also, as you might have noticed, all the LaTeX commands are inside . This looks a bit confusing at first but really isn’t. Just make sure you don’t create invalid XSLT by shuffling them around.

 

<xsl:text>\begin{itemize}</xsl:text>
<xsl:for-each select="//persName[ancestor::t:teiHeader]">
    <xsl:text>\item </xsl:text>
    <xsl:value-of select="." />
    <xsl:text>

    </xsl:text>
</xsl:for-each>
<xsl:text>\end{itemize}</xsl:text>

 

Pull paradigming emphasis

When creating simple commands using the pull paradigm, be sure that you don’t end up with too many “overlaps” since “simple” commands in LaTeX don’t take multiple paragraphs as arguments. If in doubt, always use environment and the global switches (like \bfseries). Since you are automating things, you always have to take into account that data might not always be marked-up in a way which makes sense to you. There can easily be linebreaks inside a single italic highlight. If this is the case in you data, better create an environment. For simple purposes, however, this is good enough:

 

<xsl:template match="t:hi"><xsl:text>
\emph{</xsl:text><xsl:apply-templates/><xsl:text>} <xsl:text></xsl:template>

 

With these commands (and genereally when transforming to LaTeX), you sometimes need to make sure you don’t involuntarily add spaces or lack space which will make the output hard to read and debug.

In this example, I made sure not to add any whitespace inside the template rule and also did not have Oxygen format or indent the XML to avoid these unwanted spaces.

And finally: Escaping entities

As you might remember, markup languages tend to use entities to escape certain characters. Bad thing is, LaTeX and XML use different entities. So we need to escape them. I know that the OxGarage standard stylesheet does this using the translate() function but I prefer to use since it’s less “messy” than a nested translate() construct.

Ah, and side info by the way: There is this standard stylesheet from the TEI consortium which might be of help if you are looking for inspiration. Since it is very generic, however, it might not be helpful if you are a newbie at both XSLT and LaTeX. The XSLT is pretty advanced and also the LaTeX probably uses some commands you might not be aware of.

 

    <xsl:template match="text()">
        <xsl:analyze-string select="." regex="([&amp;])|([_])|([$])">
            <xsl:matching-substring>
                <xsl:choose>
                    <xsl:when test="regex-group(1)">
                        <xsl:text>\&amp;</xsl:text>
                    </xsl:when>
                    <xsl:when test="regex-group(2)">
                        <xsl:text>\_</xsl:text>
                    </xsl:when>
                    <xsl:when test="regex-group(3)">
                        <xsl:text>\$</xsl:text>
                    </xsl:when>
                    <xsl:otherwise/>
                </xsl:choose>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <xsl:value-of select="." />
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>

 

And that’s it. I hope this was useful to you.

Cheers,

the Ninja

Buy me coffee!

If my content has helped you, donate 3€ to buy me coffee. Thanks a lot, I appreciate it!

€3.00

Advertisements

I like LaTeX, the Humanities and the Digital Humanities. Here I post tutorials and other adventures.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.