A Primer on Version Control and Why You Need It

Today’s post is a quick introduction to version control as a concept and version control systems. It explains what they are and why you should be using them.

I was just sending one of my best old-timey blogposts to a friend (How to quit MS Word for good), ended up re-reading it and realized that therein, I had promised that I would write a blog post on version control some day. And, if I’m not mistaken, I never followed up on that. So here you are, a short post on version control just to keep things going on the blog.

What is Version Control?

So I read this book a few years ago. The Complete Software Developer’s Career Guide: How to Learn Programming Languages Quickly, Ace Your Programming Interview, and Land Your Software Developer Dream Job by John Sonmez (Simple Programmer 2017). While I’m not that fond of its author anymore since I realized that he uses his platform to spread many a problematic opinion, this book has influenced me very much at the time. Actually, it was the inspiration for launching this blog and picking a name everybody would remember. So I’m indebted to John Sonmez in a way and that’s why I wanted to cite him real quick here: In his Section 3 sub-chapter “Source Control”, he offered a concise definition of version control systems:

Source control or version control as it’s sometimes called is a way to keep track of different versions of the files in the source code of a software project. It coordinates the efforts of multiple developers who might all be working on the same files. There are many versions and implementations of source control and source control systems.

John Sonmez, The Software Developer’s Career Guide (2017)

Why should I be using source control?

First of all, no more chaos like the scenario I had outlined in the How to quit MS Word for good post:

Just one file that still has all the versions saved

You keep track of different versions of the same file without having to create redundant copies of it and renaming it each time. That way, you keep your old versions safe but aren’t bothered by them unless you need them. You can always access them through the version control system but you don’t have to sift through a folder full of cryptically named files to find the most current one. No more “Document-1”, “Document-1-1”, “Document-1-correction”, “Document-1-final”, “Document-1-final-revision”.

So that’s solving an issue on the file and naming level. But it also functions a lot like the “track changes” option you might know from you text editor experience. In version control, you can also send along comments on why you made a change, for example. You can also just change things, summarize them as one big revision (for example named something like “revision of version 2 by the Ninja”) and send them off.

Collaborative editing

If the version control system is a shared online resource (which is basically the standard), you also get the added benefit that multiple people can work at time. The version control system will keep things under control. You might know this from simultaneously working together with multiple people in the same GoogleDoc (or similar). These online systems not only track the changes but they also simultaneously display them to your co-editors as you’re working. This is not commonly the case with version control in the context of source code. There, you are expected to “pull” the most recent state of the repository and then “commit” (ideally with a meaningful commit message) and “push” your own changes. You can only see the changes made by others after you have “pulled” them in the change log or history. In that way, source code version control systems are less “interactive” but they still do the same job. However, here you might run into the dreaded event of a “merge conflict” which happens when you and you collaborators were working on the same lines of code and the version control system doesn’t feel confident doing an automatic merge. Despite being a somewhat dreaded, generally undesirable event, even those conflicts are actually not that hard to resolve. You just have to look manually which parts of the code are to be kept and which are to be tossed. If you later decide that you want them back, that’s fine too. Version control has you covered.

What I just described here is an example of a distributed version control system (Git environments) versus a centralized version control system. Distributed means that everybody has a full copy of the code on their own machine and those get merged regularly to keep them the same. Centralized means that there is one central version of the file (probably on a server somewhere) to which everybody commits their changes which others then can see immediately. Wikipedia has a few thoughts on what are the advantages and disadvantages of each. The most important ones are:

  1. Distributed: you can work offline but possible conflicts
  2. Centralized: instantaneous collaboration but has to be online (usually)

I hope I got those differences right. Let me know in the comments if there’s something that needs to be updated.

What Version Control Systems are there?

  1. Git (the VCS software) and related Github (web hosting service for git repositories)
  2. Gitlab – also kind of like Github but has features like continuous integration
  3. Apache Subversion (SVN)

These are just a few examples but maybe the most important ones for you to know right now. Your “track changes” function is secretly one of them. Not quite a Ninja but still. You’re probably already using version control in one way or another.

But, of course, there is a snag. You need to learn those systems if you really want to use them. It’s not too hard really but can seem a little daunting at first. Also, they often require you to work from a terminal or command line (which I have spoken about many times but now that I come to think of it, never written a tutorial about).

If you decide to learn one of those systems, I recommend you pick git. It’s very widespread and a useful skill to have on your CV. I highly recommend the following video tutorials:

  1. Connect with Github
  2. Terminal Tutorial
  3. Git Explained in 100 Seconds

Terms to know around version control

I already mentioned a few of the important terms above but here are a few more:

  1. commit: package a number of code changes as a commit with a meaningful commit message
  2. add: in git, you need to add files to version control and the staging area first before they can get committed
  3. push: push/send all your code to the code base (to be integrated)
  4. pull: you should always pull new changes before you commit your own
  5. merge: (manually) merging files which cannot be auto-merged and thus may have caused you a merge conflict
  6. fork: a code base diverging from an existing one
  7. main branch: this is the “official” version. Other branches might exist alongside it. They might later be merged back into main. This used to be called the “master branch” but was renamed in 2020 due to the problematic nature of the term coming from slavery. Yet it’s probably relevant to know still as a lot of older literature might still use this term. You can still use older literature on version control because it doesn’t change as much as other programming languages and tech things.
  8. fix: In coding contexts, many source code changes have to do with either new code being written or old bugs getting fixed. On the marvelous yet awful practice of bug hunting, see this related post: Learning to program: Debugging – Where to start?
  9. diff: And, as a random nugget of information, you should probably be aware of the term “to diff”. It’s the act of comparing (text-based) files and identifying lines where they differ or even diverge. There is a Linux command line utility called “diff” which is probably where that term came from but you can usually do diffs even between MS Word documents (if you know where to find that option).

Conclusion

So this was a relatively short and simple post but I guess especially people new to coding or Digital Humanities might still benefit a lot from understanding this crucial concept. It’s good to know of even when you’re not actively coding yourself. And also, if you look at the reasons why you should be using version control: They are needs that you already have even if you’re not a programmer. So keep this in mind.

Thanks for all the fish!
The Ninja

Resources

  1. Intro on the Git site
  2. Wikipedia on Version Control
  3. Version Control Systems (VCS) on GeeksForGeeks
  4. Difference between Git and Github (GeeksForGeeks)
  5. Git Cheatsheet
  6. W3Schools Intro to Git and Github
  7. freecodecamp: An introduction to Git: what it is, and how to use it
  8. Maybe relevant: Algorithms, Variables, Debugging? Intro to Programming Concepts

Buy me coffee!

If my content has helped you, donate 3€ to buy me coffee. Thanks a lot, I appreciate it!

€3.00

I like LaTeX, the Humanities and the Digital Humanities. Here I post tutorials and other adventures.

One thought on “A Primer on Version Control and Why You Need It

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.