Working in teams on GitHub#

Although we don’t need to became experts in git and Github, it is important to understand how it works and, more important, start developing good practices and workflows at the moment of working under version control with other people.

A very common quote in the open source community is

“Release early, release often”

Why is this? Because this will provide feedback from the community that will furthermore help moving the software forward. This is very different than the secretive way many times traditional science tends to work. Not doing this encourages bad practices.

Another popular quote knows as the Linus’s Law is

“Given enough eyeballs, all bugs are shallow”.

This is very similar to the other quote: with more people looking at more code, the changes of making big mistakes are smaller and the software becomes more robust. This doesn’t mean that our code will not have issues, but we are more likely to detect flaws in the design and implementation of the code when more people are looking at it.

These two capture a key principle: share your work openly from as early as possible, and thus get more people to look at your work, and the result will be better. Trick is - how do we put this in practice? The trick is to use

  • Git, with remotes, connects your local workflow to the cloud.

  • In the cloud, GitHub (and other similar services like GitLab) becomes your point for collaboration.

In all this, communication is key. Git is only a technical tool that helps you manage a kind of data (the contents of your repository). It is not a replacement for working with humans - that’s your job!

Creating a better scientific community

Even when we motivate these practiced for scientific software, we believe these values of collaboration and openness can been widely applied across scientific disciples. Have you ever thought about sharing your ideas or even pre-prints to the community even before those are “ready” for publication. See for example Terence Tao Blog.

Key habits to pick up#

Here we are going to make a few recommendations about good practices for project management when we are working with git and GitHub.

1. Good atomic commits#

Just like the original word atomic that means indivisible, this means that we should make commits that express one single idea or implementation. This doesn’t necessarily means that we need to commit small changes, but instead each commit implements a new change. Also, we want this commit to have a short message. After the first line of the commit message, we can add a longer description. A general recommendation is to keep the first line of the commit message less than 50 characters. After that, we can enter an space and write more text to expand on the commit message, but the first line serves as a summary and is the one we will refer to most of the times. See What’s with the 50/70 rule? for more information about how to write good commit messages.

2. Group commits in branches that address one problem at a time#

Having multiple branches allow us to work in multiple problems at the same time, making atomic changes on each one of them as we make progress but also allowing different lines of work to progress independently. Having multiple branches allow us to explore the evolution of the progress much better and in a non-linear fashion. Remember that branches can always being merged, so eventually we can combine different stories of project into the same commit.

3. Pull Requests#

When ready for feedback, make a Pull Request (PR) for this work. When you are already to share your code with others, you don’t just update the main repository or code. Furthermore, you don’t just update one single commit at the time. Instead, we create what is know as a pull request. A pull request means that one user is requesting to some other users to review some changes and eventually approve those changes. An advantage of doing this is that GitHub allow us to maintain an communication between the parts in which feedback and comment can be done. Further changes can be implemented until the pull requests go approved. Also, think in cases where the new changes need to be tested before being incorporated. The moment for running and checking the tests is the PR, where we can also make the changes until all potential checks pass.

As we mentioned before, a pull request opens a conversation between the part that is bringing the new implementation and the current maintainers of the software. Either for small and large groups, these parts could be composed by the same or different people. This is why your PR should summarize the overall purpose of the work.

Some examples of pull requests can be found here. Notice you can see both open and closed (resolved) issues!

4. Issues#

When appropriate, create an open issue to track specific questions and problems to work on (see for example how a new, relatively small project like MyST does it). An issues is also atomic: it’s a message that address one single problem, question, concern or sometimes even new features that we want to implement. The goal of the issue is to start a conversation without code. You can see some examples of open and closed issues in the CryoInTheCloud repository.

You can close #nn issues via specific commits or in the merge commit of a PR.

For this to work well, you need an upstream repository for the common work, and your personal origin one. For example:

(base) (staging)longs[stat159]> git remote -v
origin	git@github.com:fperez/datahub.git (fetch)
origin	git@github.com:fperez/datahub.git (push)
upstream	git@github.com:berkeley-dsep-infra/datahub.git (fetch)
upstream	git@github.com:berkeley-dsep-infra/datahub.git (push)

This separates your personal workspace from that of the team, and it’s a good habit to get into even if you have direct commit rights to the group workspace.

5. Discussions#

For larger projects, it makes sense to have a forum where user and developers can ask questions. Eventually, these questions can trigger the creation of issues or even new code, but having a more informal mean of communication also helps to improve the quality of the software and also help newcomers to start interacting with a larger community of users. GitHub supports this forums thought GitHub Discussions. Also some projects decide to use Discourse instead, see for example the Jupyter Discourse.