Local, single-user, linear workflow#
Directed Acyclic Graphs (DAG)#
The fundamental unit of information in git is what is called a commit. A commit is a snapshot of a work at a point in time. Now, for version control systems we don’t want to have just one snapshot of our work, but instead we want many of them. Furthermore, instead of a linear sequence of snapshots we may imagine having multiple lines or branches of snapshots including different versions of our project. These commits are then organized in a directed acyclic graph, as the one shown in the following picture.
Credit: ProGit book, by Scott Chacon, CC License.
We identify each node (commit) with a hash, a fingerprint of the content of each commit and its parent. It is important the fact that the hash include information of the parent node, since this allow us to keep the check the structural consistency of the DAG.
Let’s create a first hash:
from hashlib import sha1
# Our first commit
data1 = b'This is the start of my paper.'
meta1 = b'date: 1/1/17'
hash1 = sha1(data1 + meta1).hexdigest( )
print('Hash:', hash1)
Hash: 3b32905baabd5ff22b3832c892078f78f5e5bd3b
Every small change we make on the previous text with result in a full change of the associated hash code. Notice also how in the next hash we have included the information of the parent node.
data2 = b'Some more text in my paper...'
meta2 = b'date: 1/2/1'
# Note we add the parent hash here!
hash2 = sha1(data2 + meta2 + hash1.encode()).hexdigest()
print('Hash:', hash2)
Hash: 1c12d2aad51d5fc33e5b83a03b8787dfadde92a4
Locals#
Type git
to see a full list of all the core commands. We’ll now go through most of these via small practical exercises:
!git
usage: git [-v | --version] [-h | --help] [-C <path>] [-c <name>=<value>]
[--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
[-p | --paginate | -P | --no-pager] [--no-replace-objects] [--bare]
[--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
[--super-prefix=<path>] [--config-env=<name>=<envvar>]
<command> [<args>]
These are common Git commands used in various situations:
start a working area (see also: git help tutorial)
clone Clone a repository into a new directory
init Create an empty Git repository or reinitialize an existing one
work on the current change (see also: git help everyday)
add Add file contents to the index
mv Move or rename a file, a directory, or a symlink
restore Restore working tree files
rm Remove files from the working tree and from the index
examine the history and state (see also: git help revisions)
bisect Use binary search to find the commit that introduced a bug
diff Show changes between commits, commit and working tree, etc
grep Print lines matching a pattern
log Show commit logs
show Show various types of objects
status Show the working tree status
grow, mark and tweak your common history
branch List, create, or delete branches
commit Record changes to the repository
merge Join two or more development histories together
rebase Reapply commits on top of another base tip
reset Reset current HEAD to the specified state
switch Switch branches
tag Create, list, delete or verify a tag object signed with GPG
collaborate (see also: git help workflows)
fetch Download objects and refs from another repository
pull Fetch from and integrate with another repository or a local branch
push Update remote refs along with associated objects
'git help -a' and 'git help -g' list available subcommands and some
concept guides. See 'git help <command>' or 'git help <concept>'
to read about a specific subcommand or concept.
See 'git help git' for an overview of the system.
Tip
You can execute commands in bash from a Jupyter notebook by adding the !
in front. You can also add the magic %%bash
at the top of the cell to indicate that all the code in the cell must be executed in the shell.
Warning
If well you can do most of the following git operations from a notebook, we actually recommend you to execute this commands directly from the shell.
First we create an empty folder and then we initialize it as a git repository. From your terminal, run
git init test
Let’s see what git just did. Move inside to this folder and use the listing function to show all the files and directories inside the test repository test
:
cd test
ls -la
Note
What makes test
a repository? If you pay attention to what is inside the folder, you will see an especial folder called .git
. Here is where all the information that git is using to version control your files belongs. If you delete this folder, then your repository will pass to be a simple folder with no version control capabilities.
Now let’s edit our first file in the test directory with a text editor. We can also create a new text file with the next command.
echo "My first bit of text" > file1.txt
The first step we have to do in order to version control this new file, we need to add it
git add file1.txt
We can now ask git about what happened with status
:
git status
The next step is to commit our changes to permanently record our changes in git’s database. For now, we are always going to call git commit
either with the -a
option or with specific filenames (git commit file1 file2...
). This delays the discussion of an aspect of git called the index (often referred to also as the staging area) that we will cover later. Most everyday work in regular scientific practice doesn’t require understanding the extra moving parts that the index involves, so on a first round we’ll bypass it. Later on we will discuss how to use it to achieve more fine-grained control of what and how git records our actions.
git commit -a -m"This is our first commit"
In the commit above, we used the -m
flag to specify a message at the command line. If we don’t do that, git will open the editor we specified in our configuration above and require that we enter a message. By default, git refuses to record changes that don’t have a message to go along with them (though you can obviously ‘cheat’ by using an empty or meaningless string: git only tries to facilitate best practices, it’s not your nanny).
Tip
You can use git log
to see what has been commited so far
git log
Sometimes it’s handy to see a very summarized version of the log:
git log --oneline --topo-order --graph
Git supports aliases: new names given to command combinations. Let’s make this handy shortlog an alias, so we only have to type git slog
and see this compact log:
git config --global alias.slog "log --oneline --topo-order --graph"
and now we can use this new alias to print a short version of the commit history
git slog
Let’s do a little bit more work… Again, in practice you’ll be editing the files by hand, here we do it via shell commands for the sake of automation (and therefore the reproducibility of this tutorial!)
echo "And now some more text..." >> file1.txt
And now we ask git what is different
git diff
The format of the output above is well explained in detail in this Stack Overflow post. But we can provide a brief summary here:
diff --git a/file1.txt b/file1.txt
This tells us which files changed overall, with ‘a’ representing the old path and ‘b’ the new one (in this case it’s the same file, though if a file had been renamed it would be different).
index ce645c7..4baa979 100644
These are hashes of the file at the two stages, needed by git itself for other operations with the diff output.
The next block shows the actual changes. The first two lines show which paths are being compared (in this case the same file, file1.txt
):
--- a/file1.txt
+++ b/file1.txt
The next line indicates where the changes happened. The format is @@ from-file-range to-file-range @@
, where there’s one more @
character than there’s parents to the file comparison (git can handle multi-way diff/merges), adn the file range format is -/+<start line>,<# of lines>
, with -
for the from-file
and +
for the to-file
:
@@ -1 +1,2 @@
Lines prepended with -
correspond to deletions (none in this case), and lines with +
to additions. A few lines around deletions/additions are shown for context:
My first bit of text
+And now some more text...
And for now on, the circle of virtue just repeats: work, commit, work, commit
git commit -a -m"I have made great progress on this critical matter."
While git add
is used to add files to the list git tracks, we must also tell it if we want their names to change or for it to stop tracking them. In familiar Unix fashion, the mv
and rm
git commands do precisely this:
git mv file1.txt file-newname.txt
git status
Note that these changes must be committed too, to become permanent! In git’s world, until something hasn’t been committed, it isn’t permanently recorded anywhere.
Apendix#
These is the sequence of all bash commands we have use in this tutorial in the right order.
git init test
...
...
...