Automation and Make#

As we work in a project, we often encounter certain commands and operations that we end up running multiple times. Many of these operations regards the behaviour of certain programs and corresponds to programs that we execute from the terminal.For example, so far in this course we were

  • File management: creation of files/folders.

  • Running code from Python scripts and Jupyter notebooks that perform certain analysis, reading data and generating outputs.

  • Creating virtual environment; activate them; install new packages; creating an iPython kernel.

  • Creating a JupyterBook

As our workflow of grows, these operations start becoming more complex and dependent of each other. Make allow us not just to automatize the execution of programs, but also keep a track of the network of commands between the different parts of out project.

0. Setup#

Let’s consider the following piece of code inside our Eratosthenes project. Let’s create a new Python script called calculate_prime.py with the following piece of code

# calculate_primes.py

import sys
import math
import numpy as np

def sieve(nmax):
    """
    Function to compute prime numbers. 
    
    Arguments: 
        - nmax: integer. Upper bound for prime search.
    Ourputs:
        - all_primes: list. List with all the prime numbers slower than nmax
    
    """

    all_primes = []

    if nmax == 2: 
        all_primes = [2]
    else:
        primes_head = [2]
        first = 3
        primes_tail = np.arange(first,nmax+1,2)
        while first <= round(math.sqrt(primes_tail[-1])):
            first = primes_tail[0]
            primes_head.append(first)
            non_primes = first * primes_tail
            primes_tail = np.array([ n for n in primes_tail[1:]
                                    if n not in non_primes ])

    all_primes = primes_head + primes_tail.tolist()
    
    return all_primes


if __name__ == '__main__':
    n = int(sys.argv[1])
    print(sieve(n))

The last part of calculate_prime.py includes the __main__ header. This is what allow us to run and read arguments directly from the terminal. Now, from the terminal we can run sieve() with

python calculate_sieve.py 10

which should print the list [2, 3, 5, 7].

Warning

Remember to check in which environment you are running this code! If you do this from the base environment this won’t work, since numpy is not installed there. As we always emphasize, always check in which environment you are running code. You can activate the notebook environment or use the environment you created for the Eratosthenes project in Lab 04.

Now, let’s move thing a little bit around. Instead of passing the argument variables by the terminal and then printing the outputs, let’s create an input.txt and output.txt file that reads a list of arguments and save them in an output file. We can archive this by modifying the previous script to include

if __name__ == '__main__':
    input_file = sys.argv[1]
    output_file = sys.argv[2]
    # Read each line of the file
    with open(input_file) as file:
        lines = file.read().splitlines()
    results = []
    for n in lines:
        results.append(sieve(int(n)))
    # Save values
    with open(output_file, 'w') as output:
        for i, res in enumerate(results):
            output.write("{} {}\n".format(lines[i], res))

Create now an data/input.txt file with one integer number per line, create a folder called results, and now execute

python3 calculate_prime.py data/input.txt results/output.txt

This will create the file output.txt file inside the folder results with the printed outputs.

Running from iPython

You can also run the previous command directly from the iPython cell inside Jupyter Notebook instead of a terminal by using the %run magic command:

%run calculate_prime.py input.txt output.txt

1. Automation with Bash#

Now, if we now want to perform one simple operation, we can run individually commands form the terminal. However,

  1. This doesn’t look fully reproducible

  2. It doesn’t escalate very well when our analysis requires execution of multiple program lines.

  3. Do not generalize very well to cases with different input/output files.

Notice that the workflow introduced in the previous section required at least three steps: the activation of the correct conda environment, the creation of the output folder, and the execution of the Python script.

A first solution to some of this problems will be to create a Bash script that executes all these operations. Let’s make this

#!/bin/bash

conda activate notebook
mkdir results 
python calculate_prime.py input.txt results/output.txt

The header of the file has the shebang #! that indicates that this is an executable file. You will probably need to change the permission to the file in order to execute it. Explore the chmod command in bash for doing this

Warning

This doens’t activate the environment since it does’t recognize conda from the bash script.

2. Our first Makefile#

Now, instead of having all these instructions in a bash script, let’s use Make instead. This is a build file. Altought similar to a bash script, they are not the same. Let’s begin with something simple and let’s create a file called Makefile with the following content

# Compute prime numbers
results/output.txt : input.txt
    python calculate_prime.py input.txt results/output.txt

and now, from the terminal let’s execute just the command make:

make

This executed the Python script and generates the respective outputs in results/output.txt.

Warning

It is important that the indentation inside Makefile are tabs instead of spaces. If you are working from JupyterLab, you can change this configuration in Settings > Text Editor Indentation.

Make Syntax

The basic syntax inside the Makefile can be described as follows

# Comments
<TARGETS> : <DEPENDENCIES>
    <PROGRAMS>

The # is used for comments. The section for programs can include multiple lines of scrip and with increasing level of complexity, for example by including conditional statements. The important thing you need to know is that inside <PROGRAM>, you are running bash code.

2.1. Re executing code#

One of the things that makes Make special is that it doesn’t execute operations that had been executed already with dependencies that haven’t change over the course of time. For example, in the previous example, the output.py depends of both the input data input.txt and the Python script calculate_prime.py. If we don’t change these two files and execute one more time make you will observe no change, plus male will print a message similar to this one:

!make
make: Nothing to be done for 'outputs'.

Now, if we make the minimum change to any of the dependencies files, then Make will execute the program again. For example, if you just update the timestamp of any of the files (touch input.txt) and run make again, you will see the Python code will be executed again and the timestamp of output.py will be updated too.

%%bash
touch input.txt
make
bash: /srv/conda/envs/notebook/bin/../lib/./libtinfo.so.6: no version information available (required by /srv/conda/envs/notebook/bin/../lib/libreadline.so.8)
make: Nothing to be done for 'outputs'.

Some advantages of this build-in memory system of Makefiles are

  • We save repeating unnecessary operations as we run a dataflow.

2.2. Special characters#

As you can see from Makefile, we are being redundant about the name of both dependency and target files. Instead of

results/output.txt : input.txt
    python calculate_prime.py input.txt results/output.txt

we can instead write

results/output.txt : input.txt
    python calculate_prime.py $^ $@

The symbols beginning with $ are special characters in Make that have special meaning and can be used as shortcuts. You can find a full list of them here. Some of the most useful ones include

  • $^: The list of all dependencies

  • $<: The name of the first dependency

  • $@: The name of the target

  • $%: Make wildcard (See Section 3.3)

3. Adding more functions to out Makefile#

Let’s explore some other commands we can add to out Makefile that will be useful as we automatize and execute more code.

3.1. Cleaning#

We may be interested in removing all existing output data so we can recreate them.

.PHONY : clean
clean : 
    rm -f results/*

and then run just the cleaning command with

make clean

Phony target

A phony target is one that is not really the name of a file. It is just a name for some commands to be executed when you make an explicit request. There are two reasons to use a phony target: to avoid a conflict with a file of the same name, and to improve performance (see here for mor information).

3.2. Grouping operations#

Now, we can combine multiple operation under the same group. By creating the target output, we can create all the output files using make outputs:

.PHONY : outputs
outputs : results/output1.txt results/output2.txt

result/output1.txt : input1.txt
	python calculate_prime.py $^ $@
    
result/output2.txt : input2.txt
	python calculate_prime.py $^ $@
    
.PHONY : clean
clean : 
	rm -f results/*

Since outputs doesn’t refer to the name of any target file, we add it as a .PHONY target, just as we did with clean.

Make will automatically run the first make command inside our Makefile. That means that if we locate the following line

.PHONY : outputs
outputs : results/output1.txt results/output2.txt

in the top of our ``Makefile, then the makecommand will execute by defaultmake outputs`.

3.3. Wildcard#

In our Makefile, there are two ways of using wildcards. For targets and dependencies (the <TARGETS> : <DEPENDENCIES> part in our Makefile) we can use a generic wildcard %. This is used to find patterns and automatize the processing of them. For example, we simplify the last two commands into one by using

result/output%.txt : input%.txt
    python calculate_primes.py $^ $@

Now, we can use the placeholder $* in to call any matching run we found with % between the commands. For example, an equivalent way of running the previous command would be doing

results/output%.txt : input%.txt
    python calculate_prime.py $< results/output$*.txt

3.3. Working directory setup#

You can also use Make to setup your working directory, running the same operations you will run from the terminal but in a more axiomatic way.

.PHONY : setup
setup:
    mkdir results

3.4. Make for creating a new environment#

You can use commands in make to create and manipulate conda environments. You can do this in such a way that you can automatize the process of creating an environment form a .yml file, install new dependencies (eg ipython) and then create the corresponding kernel for the environment.

Warning

Unfortunately, it is not possible to activate environments using make. A workaround solution to this problem is to have separate make commands to create your environment and install the required dependencies. By doing this, you can execute the full creation and deletion of a conda environment with the bash commands

make create_environment 
conda activate <myenv>
make update_environment
    
make delete_environment

You can find more information about this workflow in this talk.

Another solution to this is to include the following command to your Makefile

.ONESHELL:
SHELL = /bin/bash

By default, every line in a recipe of a make command is executed in a different process. The .ONESHELL: command allow us to run all the commands inside an operation in the same shell. The line SHELL = /bin/bash makes explicit the use of shell, which allow us to do

.ONESHELL:
SHELL = /bin/bash

create_environment :
	source /srv/conda/etc/profile.d/conda.sh
	conda env create -f environment.yml 
	conda activate notebook
	conda install ipykernel
	python -m ipykernel install --user --name make-env --display-name "IPython - Make"

3.5. Self-documenting Makefile#

A very useful feature we can add to our Makefile is to include documentation for the different operations we write. A simple hack for doing this automatically consists in including a commented line starting with ## on top of each operation, for example,

## clean             : Remove output files
.PHONY : clean
clean : 
	rm -f results/*

and then include the following command in your Makefile.

.PHONY : help
help : Makefile
	@sed -n 's/^##//p' $<

Now, the next time you execute make help from the terminal you will see

!make help
 clean       : Remove auto-generated files.