Lab Session 4: Modularization, Environments, Binder

Lab Session 4: Modularization, Environments, Binder#

Friday 02-17-2023, 9AM-11AM & 12AM-2PM

Instructor: Facundo Sapienza

Inside the notebook Erasthostenes.ipynb (located in this same folder in the site repository) there is a non-very well written piece of code for calculating all prime numbers until a certain number. We are going to work with the contents of this notebook and re-organize it so it looks like a decent project: better code, better documentation, and more reproducible.

Useful links:

1. Modularization#

First things first: be sure you can run all the code inside Eratosthenes.ipynb. After being sure the code runs, we are going to start with some re-organization of the code inside Eratosthenes.ipynb.

1.1. Reorganization#

The first thing you will do is to create two separate functions for the sieve of Eratosthenes algorithms. These are the first things you will see inside the notebook. Think what are the arguments for such functions.
Now, these two functions do the same but they have different implementations. Most likely, they also have the same arguments and outputs. Create a new function get_primes that you call to run both methods.
Move these three functions to a separate script called sieve.py. Remember to import the necessary libraries inside sieve.py.
Import all this function into the notebook using the import statement in the first cell of the notebook.

Warning

For now, every time you make a change in sieve.py and a consequent import inside the notebook, you will have to restart the kernel first. This is because once imported, the iPython kernel reads that the script has been already loaded and it doesn’t look for differences in the new file. In general, it’s a good practice to restart your kernel once the counter starts increasing and after doing significant reorganization changes in the notebook, like we are doing here.

In the section Long Run: proportion of prime numbers there is a long run of exactly the same code than before but inside a for loop, so we can evaluate the proportion of prime numbers as we increase the search window. Make a new function called proportion_primes that does this.
You function look good, but now let’s add some documentation to them. Add a docstring to them. How can you access the docstrings from the notebook?
Be sure you can run these functions from the notebook after you move the code to the scripts (no remains in the notebook) and after you have restarted your kernel.

1.2. Modules#

Now, after you know the previous import statement are working, move to sieve.py to a new folder called sievetools.
Inside the folder, create an __init__.py file with the required contents so you can do the import from the notebook. Now, you should be able to do the import in the notebook by doing

from sievetools import sieve

Tip

Now that you have a proper module, you can import the scripts using the import imp; imp.reload("sieve.py") in case you want the changes to be reflected after the import.

Each one of the files inside sievetools is called a module. Now, continue your cleaning of Eratosthenes.ipynb by moving all the plotting code required to a new module called plots. Update __init__.py to make the imports of this new module. By the end of the day, you should be able to generate the plots from the notebook using something like

plot_sieve(.... , log_scale=True)

[Extra] You can continue and do something similar for the code in the section Performance.

2. Migrate to GitHub#

Let’s now create a project containing the previous piece of code and let’s version control it!

Create a new public repository called eratosthenes where you will include all the contents for the Eratosthenes project you created before. You can create your repository first in local and then push it to GitHub or directly created in GitHub and then push your changes there. Also, you can create the repository in your personal GitHub account or also in the UCB-stat-159-s23 GitHub site, in which case please we ask you to add your username/teamname in front of the repository name (eg, facusapienza21-eratosthenes).
Add a README.md file in the repository with a minimal explanation about the project.

3. Environmnets#

In this section we are going to create a conda environment for this project. Two very useful resources for this section are

Lecture notes on environments: Here is the information about what a conda environmnt is and how to manage them. Please take some time to go though it becuase you will need it!
Managing environments: The conda documentation about operations on environments is very complete.

image taken from Python Virtual Environments: A Primer

3.1. Create environment for a project#

There are different ways how you can create a new conda environment. For reproducibility, the best practice is to share an environment.yml file that includes all the required packages and then create the environment from it. In order to start with this, we are going to start by creating a fresh environment that then we are going to export and share.

First that all, create a new folder in your home directory called envs. You can do this from the terminal

mkdir envs

Now, go to the .condarc file (hidden) in your home directory and be sure that it includes a list of envs_dirs with the locations where you are going to install the conda environments. The .condarc file should include something like this

envs_dirs:
  - ~/envs
  - ~/shared/envs 

In the terminal, let’s create a new environment called test_env with an specific Python version. It may take a few seconds until it creates the new environment.

conda create -n <ENVIRONMENT NAME> python=3.9

In the terminal, do conda env list to see the list of all the environments you have installed. Activate the new environment with conda activate <ENVIRONMENT NAME>.
At this point, you can run code from this environment just from the terminal. Our next step is to make this environment available from the notebook. Take a look to the Using an environment in your notebooks. In order for this to work, first you need to install the ipykernel package. You can do this with the following command

mamba install ipykernel

After doing this, you need to let Jupyter know that you want to use this environment’s kernel, by installing the environment’s ipykernel into Jupyter:

python -m ipykernel install --user --name <ENVIRONMENT NAME> --display-name "IPython - <NAME>"

After doing this, the environment should be visible from the notebook. Go to the upper right corner of your notebook and change the kernel. It may be necessary to restart your server in order to make this changes visible.

Note

What just happened? You have created a new virtual environment in your machine. Where is it? Go inside the envs folder and see what is there. This is where you conda environments live. If you delete this folders, the environment disappear.

3.2. Growing and exporting#

Now that the environment is working and running from the notebook Eratosthenes.ipynb, do the required installations so the code runs. So far, nothing really has being installed in your environment. You will need to install basic packages like numpy and matplotlib. You can use pip, conda or mamba to install packages, but we recommend you to use mamba when possible.
Once you environment has all the required dependencies, export them to a environment.yml file. The conda env export shows you the contents that should go to the configuration file. We recommend you to use the following syntax at the moment of exporting an environment.

conda env export --from-history > environment.yml

Now check what is inside environment.yml. Push this file to your repository.

These is the list of commands that you most likely will be using all time when working with conda environments:

conda activate
conda env list
conda list
conda env export

3.3. Binder#

Now, we are going to launch the code into the cloud so everyone can run it! We are going to use Binder for this. Doing this is surprisingly simple.

Go to mybinder.org and enter the information of the repository. Then just enter Launch! In order for this to work, be sure that
1. The environment file needs to be specified under the name environment.yml. Be sure your GitHub repository contains that file.
2. Be sure your repository is public.
If everything is in order, this will launch the JupyterLab interface you are already using for this course. Check that the code runs here too and that all the required packages are installed in this new server. Now you can see how Binder helps ensure reproducibility.
Binder allows you to create a badge and link that will launch the virtual machine every time you click on it. See how to do this and add this badge to the README.md file.