Git#

An interactive Git Tutorial: the tool you didn’t know you needed#

Note: this tutorial was particularly modeled, and therefore owes a lot, to the excellent materials offered in:

  1. “Git for Scientists: A Tutorial” by John McDonnell (no link as this tutorial seems to have disappeared from the internet).

  2. Emanuele Olivetti’s lecture notes and exercises from the G-Node summer school on Advanced Scientific Programming in Python.

In particular I’ve reused the excellent images from the Pro Git book that John had already selected and downloaded, as well as some of his outline. But this version of the tutorial aims to be 100% reproducible by being executed directly as an IPython notebook and is hosted itself on GitHub so that others can more easily make improvements to it by collaborating on Github. Many thanks to John and Emanuele for making their materials available online.

After writing this document, I discovered J.R. Johansson’s tutorial on version control that is also written as a fully reproducible notebook and is also aimed at a scientific audience. It has a similar spirit to this one, and is part of his excellent series Lectures on Scientific Computing with Python that is entirely available as Jupyter Notebooks.

Wikipedia defines version control as

Version control

Revision control, also known as version control, source control or software configuration management (SCM), is the management of changes to documents, programs, and other information stored as computer files.

Version control systems allow you to build better reproducible workflows by tracking and recreating every step of your work. A good version control tools gives you

  • Peace of mind (backups)

  • Freedom (exploratory branching)

  • Collaboration (synchronization)

Git is an enabling technology, the idea is that we can use version control for everything:

  • Paper writing (never get paper_v5_john_jane_final_oct22_really_final.tex by email again!)

  • Grant writing

  • Everyday research

  • Teaching (never accept an emailed homework assignment again!)

The plan for this tutorial#

This tutorial is structured in the following way: we will begin with a brief overview of key concepts you need to understand in order for git to really make sense. We will then dive into hands-on work: after a brief interlude into necessary configuration we will discuss 5 “stages of git” with scenarios of increasing sophistication and complexity, introducing the necessary commands for each stage:

  1. Local, single-user, linear workflow

  2. Single local user, branching

  3. Using remotes as a single user

  4. Remotes for collaborating in a small team

  5. Full-contact github: distributed collaboration with large teams

In reality, this tutorial only covers stages 1-4, since for #5 there are many software development-oriented tutorials and documents of very high quality online. But most scientists start working alone with a few files or with a small team, so I feel it’s important to build first the key concepts and practices based on problems scientists encounter in their everyday life and without the jargon of the software world. Once you’ve become familiar with 1-4, the excellent tutorials that exist about collaborating on GitHub on open-source projects should make sense.