Lab Session 0 - Introduction to Python#

Firday 01-20-2023, 9AM-11AM & 12AM-2PM

Instructor: Facundo Sapienza

Welcome to Stat 159/259! This is our first lab, so for today we will focus in setting you up in Github and make some practice in Python. The menu for today is

  1. Setting up GitHub Account. If you want to learn more about how to work with GitHub and how GitHub internally works, we recommend you to take a look to The curious coder’s guide to git. After we all have a GitHub account, we will ask you to complete this form so we can add you to GitHub Classroom.

  2. Warming up with Python. We will follow the Python tutorial written by Jake VanderPlas, A Whirlwind Tour of Python. We also invite you to play with the PythonTutor, where you can see how variables are referenced to objects as you write your code.

We will be working in the JupyterHub for the full course, which has the libraries and tools we will be using during the semester. Why do we use the Hub? Well, it is quite convenient since you don’t need to install anything in your personal computer and you don’t need to worry about having installed all the required packages with their right version.

We are going to use Python for all the projects and homeworks in this course. However, it is important to remark that many of the concepts we will see in this course apply to other programming languages too (Julia, R, C, etc). Some good reasons for working in Python include

  • Interpreted instead of compiled

  • Clean syntax

  • Object oriented (attributes + methods), convenient language constructs

  • Variables can access large data structures by reference, without making a copy (speed and memory efficient).

1. Setting up GitHub#

If you don’t have a GitHub account yet, you can create one following this link (please DON’T use github.berkeley.edu to create a GitHub account). You can configure your preferences from your terminal by using

git config --global <setting> <option> 

For example, you can configure your name and email:

git config --global user.name "Facu Sapienza"
git config --global user.email "fsapienza@berkeley.edu"

You can also execute the same command in bash from a notebook by adding a ! in front of the command inside a cell. In general, it is recommendable to run bash commands from the terminal.

!git config --global user.name "Facu Sapienza"

For this course, we will be using GitHub Classroom. Once you all have a GitHub account created, we will add you to the repository for the course.

Once you have your GitHub account, please take a few minutes to complete this form.

The way we have to authenticate push/pull from the hub is by using this github-app-user-auth, a tool developed by @yuvipanda. Here’s how you can use it.

  1. Go to apps/stat159-berkeley-datahub-access, and ‘Install’ the app. Give it access to whichever repositories you want to push to. You can come back and add more repos here later if you wish.

  2. Login to stat159.datahub.berkeley.edu, and open a terminal.

  3. Run github-app-user-auth on the terminal. It’ll tell you to open a link in your browser, and input a 6 character code it gives you in the page opened.

  4. Once done, ‘Accept’ and it’ll ask you if you want to authenticate.

  5. Once accepted, you’re done! You can now push to the repositories you gave access to in step 1 for the next 8 hours or until your server stops from inactivity! We’ll hopefully have a quick ‘sign in’ button at some point that can make this a bit more streamlined, but this should work nicely already.

2. Warming up with Python#

For today’s session, we will follow A Whirlwind Tour of Python. You can see the contents of the book online, but you can also clone the repository of the book with the following command

git clone https://github.com/jakevdp/WhirlwindTourOfPython.git

Whenever you are doing this from the terminal or a notebook, remember to run this command from the directory you want the repository to be cloned.

It is important that you are familiar with the contents of chapters 1-8, which include some introductory python syntax and data structures. If not, please take a few minutes to go thought these notebooks and get familiar with them. These concepts include:

  • Basic operations (arithmetic, comparison, assignment, …)

  • Manipulation of simple data structures (lists, dictionaries, tuples)

  • Control flow (for loops, conditional statements)

2.1. Functions#

For this part of the lecture, we recommend following Chapter 9. As we write more and more code, keeping track of what each piece is doing can became quite difficult. Functions allow us to encapsulate pieces of code that are responsible for addressing a more specific task. Then, a nice piece of code looks like different functions (sometimes concatenated one to each other) doing different tasks in order to archive a final mayor goal.

Functions receive different kind of arguments. The scope of this variables is always local, meaning that the variable name is declared just inside the function.

from math import gcd      # we import the great common divisor function from math

def get_coprimes(L, d=2):
    """
    Function to extract the coprimes elements of a list with respect to some give integer
    
    Arguments:
        - L: list with integers 
        - d: integer agains with the coprimality is evaluated
        
    Outputs:
        - res: list of subelements of L that are coprime with d
    """
L0 = [1, 2, 3, 4, 5, 6]

get_coprimes(L0)
[1, 3, 5]
get_coprimes(L0, 3)
[1, 2, 4, 5]

A few comments about this last example:

  1. Suppose we want to ignore the trivial case of every number being coprimer with 1. What can we do then? Do we add another conditional statement?

  2. What do we do if L has negative values?

  3. What do we do if there are values in L that are no integers?

  4. Can you think in ways of implementing get_coprimes by using a different kind of data structures?

  5. Is it possible to extent the scope of the variables inside the function (eg, to obtain the value of res inside get_coprimes outside the scope of the function)?

We can solve some of these problems by hand. In the next section we will see how to add exceptions for conflictive cases.

Something really useful about functions in Python is that we can add flexible arguments. These are divided into

  • Simple arguments: *args

  • Keyword arguments: **kwargs

The * operator is usually called the unpacking operator.

def catch_all(*args, **kwargs):
    print("args =", args)
    print("kwags =", kwargs)
    
    if len(args) == 3:
        print(args[1])
    print(kwargs['b'])
catch_all(1, 2, 3, b=1.1, c=1.2)
args = (1, 2, 3)
kwags = {'b': 1.1, 'c': 1.2}
2
1.1
def sum_args(*args, **kwargs):
    
    res = 0
    
    for x in args:
        res += x
        
    if "factor" in kwargs.keys():
        res *= kwargs["factor"]
        
    return res
sum_args(1, 2, 3, 5, factor=1)
11

We can also define anonymous functions, usually referred by the symbol lambda. This are useful for different things (for example, we will see how useful they are when dealing with dataframes in Pandas)

add_one = lambda x: x+1
add_one(1.2)
2.2
sorted(L0, key = lambda x : x%2)
[2, 4, 6, 1, 3, 5]

Why is Python an object oriented program if we are all time working with functions… well, functions are also objects:

def duplicate(x):
    return 2 * x

def apply(x, func):
    return func(x)

apply(2, duplicate)
4

The function apply is what we call a higher-order function, since it takes another function as an argument. Another examples of higher-order functions in Python are map and filter.

We can also access the different attributes of a Python function:

?dir
Docstring:
dir([object]) -> list of strings

If called without an argument, return the names in the current scope.
Else, return an alphabetized list of names comprising (some of) the attributes
of the given object, and of attributes reachable from it.
If the object supplies a method named __dir__, it will be used; otherwise
the default dir() logic is used and returns:
  for a module object: the module's attributes.
  for a class object:  its attributes, and recursively the attributes
    of its bases.
  for any other object: its attributes, its class's attributes, and
    recursively the attributes of its class's base classes.
Type:      builtin_function_or_method
dir(duplicate)
['__annotations__',
 '__call__',
 '__class__',
 '__closure__',
 '__code__',
 '__defaults__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__get__',
 '__getattribute__',
 '__globals__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__kwdefaults__',
 '__le__',
 '__lt__',
 '__module__',
 '__name__',
 '__ne__',
 '__new__',
 '__qualname__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__']

and also define new attributes:

duplicate.__version__ = "0.0.1"
duplicate.__version__
'0.0.1'

2.2. Errors and Exceptions#

Different kinds of errors that occur as we write code include syntax, runtime and semantic errors. Specially for runtime errors, Python give us a clue about what kind or error may happened during the execution of our code. For example,

1 / 0
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
/tmp/ipykernel_1928/1455669704.py in <module>
----> 1 1 / 0

ZeroDivisionError: division by zero
my_dict = {'a':1, 'b':2}
my_dict['c']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_1928/145831508.py in <module>
      1 my_dict = {'a':1, 'b':2}
----> 2 my_dict['c']

KeyError: 'c'
my_dict + {'c':3}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_1928/1025278159.py in <module>
----> 1 my_dict + {'c':3}

TypeError: unsupported operand type(s) for +: 'dict' and 'dict'

There are many more different kind of built-in exceptions in Python. You can find some more examples in this link. A general RuntimeError is raised when the detected error doesn’t fall in any of the other categories.

There are different ways of dealing with runtime errors in Python, there include the

  • tryexcept clause

  • raise statement

a = 1    # numerator
b = 0    # denominator
try:
    print("I was here")
    a / b
    print("Was I here?")
except: 
    print("Something wrong happened")
I was here
Something wrong happened
a = 1
b = 0

if b == 0:
    raise ValueError("b must be different than zero.")
a / b
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [42], in <module>
      2 b = 0
      4 if b == 0:
----> 5     raise ValueError("b must be different than zero.")
      6 a / b

ValueError: b must be different than zero.

In the following example, we use both tryexcept and raise, but it’s not working as we may expect. Can you identify the problem?

a = 1
b = 0

try:
    print("I was here")
    if b == 0:
        raise ValueError("b must be different than zero.")
    a / b
    print("Was I here?")
except: 
    print("Something wrong happen")
I was here
Something wrong happen

Exceptions, as everything in Python, are objects too. This means we can define new exception errors and deal with them as if they were classes:

class MyNewException(Exception):
    pass

Homework 0#

The idea of this homework is to put into practice your Python skills and also show you how the workflow of submitting and reviewing homeworks/projects will work using GitHub Classroom.

For now, we are going to clone a public repository in the GH site of the course and work from there. You can find the repository in the course GitHub site or clone it directly:

git clone https://github.com/UCB-stat-159-s23/hw00.git

This first homework is NOT going to be graded. We are goi