Syllabus#
Statistics 159/259: Reproducible and Collaborative Statistical Data Science
Textbook#
While not strictly a textbook for this course, we will rely heavily on the excellent, openly licensed: Research software engineering in Python. More resources are listed in the course overview page.
Administrativia#
Prerequisites#
Statistics 133, 134, 135
Graduate standing is required to register for Statistics 259.
Willingness to learn programming languages and software tools independently (tools used will include Python; Jupyter Notebooks; the Python “scientific stack” of numpy, scipy, matplotlib, pandas, and scikit; git; GitHub; GitHub actions; Docker; LaTeX, Markdown, pandoc)
Willingness to learn some statistical methodology by reading on one’s own (materials and links will be provided, but not all topics required to do the homework will be covered in lecture).
Format and assessment#
3 hours of lecture and 2 hours of lab per week
lectures will focus on theory, philosophy of science, foundations of statistics, scientific applications, software engineering, code reviews and group discussion.
lab will focus on computing, software tools, workflow, and collaboration
For each assigned reading, you will submit a brief, 2 paragraph report by Wednesday at 9pm (Exception: the first one will be due Thursday Jan 26). The first paragraph should summarize the reading. The second paragraph should briefly explore something that interested you (e.g., you may wish to focus on one aspect of the paper in more depth, you may wish to discuss something in the reading that you disagree with). During lecture, we will draw upon your reports for some group discussion.
Office hours#
Perez: Wednesday, 10-11AM, 419 Evans Hall. I will normally also keep an open Zoom session for those needing to join remotely for Covid or other reasons.
Graduate Student Instructor#
Labs: Friday 9AM-11AM & 12AM-2PM (340 Evans Hall).
Office hours: Tuesdays 1PM-2PM & Thursdays 2:30PM-3:30PM (428 Evans Hall).
Communication#
Please use the course Ed for questions about course material and logistics. For personal matters (illness, accommodations, etc.) that should remain private, please make a private Piazza post that only the instructor and GSI will see. You may obviously email one of us privately if you need, but in general we’ll be able to more efficiently handle class communications if they stay on Piazza.
During the work week, we expect to be able to reply to Piazza messages and email within 24 hours. On weekends, we might need longer.
Grading#
The course is not graded on a curve. It is possible for every student to make an A. We encourage you to focus on mastering the material, not on your grade. The weight of each assignment will be announced with the assignment; the overall grade structure will be:
85%: from approximately 8 computational assignments, some individual and some collaborative.
15%: reading assignments (weekly, on average).
In case of medical exception, submit on Ed a private note to the instructors with a medical proof showing that you are unable to complete the assignment. We will grant extra 48hrs to the reading assignment/homework to be submitted, unless more time is required.
Homework assignments#
Homeworks deadlines will be posted immediately after the homework is released.
We will accept late homework assignments until 24hrs after the deadline of the homework. However, in those cases a 25% penalty will be applied to the final score.
For group homework projects, you will include a statement in your repository acknowledging the contribution of each team member. Unless there is a major, unfair imbalance in the amount of work done by each team member, the same grade will be assigned to all team members.
Submitting assignments: Submit written assignments by making a pull request to your private repository within the Berkeley GitHub organization for the class, using the GitHub Classroom (you will practice all this, don’t worry).
Reading assignments#
These will be posted on the course website under Assigned Readings. For each paper/reading in the weekly list, you should submit a summary paragraph and idea highlight paragraph. You will submit your reading assignments in bCourses.
Reading assignments will be due every Wednesday at 9pm (Exception: the first one will be due Thursday Jan 26). No later reading assignments will be accepted unless there is a medical exception. In that case, you will need to submit in Ed a private note to the instructors with a medical proof showing that you are unable to complete the assignment.
You can drop two readings without need of justification. Notice that this applies to INDIVIDUAL readings. For example, if the weekly reading consists of 4 papers, you can drop a maximum of two of them. If you drop two readings in one week, you cannot drop any other one without penalty.
Each paragraph per reading assignment gives 1 point (a total of 2 per reading). The final points for the reading assignments is the sum of all the readings. Notice that this means that the maximum credit you can obtain per week depends on the number of readings that week.
Code of conduct; attribution of work#
The high academic standard at the University of California, Berkeley, is reflected in each degree awarded. Every student is expected to maintain this high standard by ensuring that all academic work reflects unique ideas or properly attributes the ideas to the original sources.
These are some basic expectations of students with regards to academic integrity: Any work submitted should be your own individual thoughts, and should not have been submitted for credit in another course unless you have prior written permission to re-use it in this course from this instructor.
All assignments must use “proper attribution,” meaning that you have identified the original source and extent or words or ideas that you reproduce or use in your assignment. This includes drafts and homework assignments! If you are unclear about expectations, ask your instructor.
Do not collaborate or work with other students on assignments or projects unless the instructor gives you permission or instruction to do so.
Disability accommodations#
If you need an accommodation for a disability, if you have information your wish to share with the instructor about a medical emergency, or if you need special arrangements if the building needs to be evacuated, please inform the instructor as soon as possible.
If you are not currently listed with DSP (the Disabled Students’ Program) and believe you might benefit from their support, please apply online at https://dsp.berkeley.edu.