Textbook and Supporting Materials#
While not strictly a textbook for this course, we will rely heavily on the excellent, openly licensed: Research software engineering in Python. We will complement it with these other scientific python resources:
Katy Huff’s - Effective Computation in Physics.
Jake van der Plas’ A Whirlwind Tour of Python.
Stefan van der Walt’s Python Survival Pack and Elegant SciPy Book. The full book and all the notebooks are available.
Josh Bloom’s Python for Data Science Berkeley Course.
Getting started with Python for research, a gentle introduction to Python in data-intensive research.
Python for Data Analysis, 2nd Edition, by Wes McKinney, creator of Pandas. Companion Notebooks
Effective Pandas, a book by Tom Augspurger, core Pandas developer.
And we’ll use these Earth Science resources for our domain focus:
Ryan Abernathey’s research computing for Earth Sciences.
Brain Rose’s Climate Laboratory.
Lisa Tauxe’s Python for Earth Science Students
Git and git workflows
Continuous integration
Miscellaneous computing tutorials
Other bibliography#
Above are a list of books and websites mostly focusing on computational skills, and this is a list of all the bibliography we’ll refer to in the course. Some of these will become assigned readings, while others are available for your reference.
PLOS Ten Simple Rules#
The PLOS Ten Simple Rules collection has many short, valuable papers full of relevant, practical advice in this space. A few that stand out, though many (if not most) are worth your time, are “Ten simple rules for …”:
Computational research
Open Source Software and Open Science
Data Management
The art of research
National Academies Reports#
These are key reports produced by the National Academies of Science, Engineering and Medicine. They were created by teams of world experts in the field, and inform policy in multiple areas:
Reproducibility and Replicability in Science, 2018. The previous link contains multiple resources on this topic, including overview videos, from a large effort comissioned by the National Academies of Science, Engineering and Medicine. For reading, this NCBI link has both HTML and PDF download options.
Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results, 2016.
Open Source Software Policy Options for NASA Earth and Space Sciences, 2018.
Open Science by Design, Realizing a Vision for 21st Century Research (2018).
Developing a Toolkit for Fostering Open Science Practices, 2021
Other general references on reproduciblity and open science#
Millman and Pérez 2014, Developing open source scientific practice.
Keith Baggerly and the Potti & Nevins Cancer Scandal:
Barba 2016, Top-10 Readings in Reproducibility, a syllabus on reproducible research by Prof. Lorena Barba. Of particular interest from Barba is the Barba 2012, Reproducibility PI Manifesto, with slides available here, as well as
Wilson et al, 2012 - Best Practices for Scientific Computing
Granger and Pérez 2021, Jupyter: Thinking and Storytelling With Code and Data.
The Practice of Reproducible Research, Case Studies and Lessons from the Data-Intensive Sciences. An online (and printed) book produced by Berkeley researchers. It includes the excellent Achieving Full Replication of our Own Published CFD Results, with Four Different Codes.
Reliability and reproducibility in computational science: implementing verification, validation and uncertainty quantification in silico. A special issue of Philosophical Transactions of the Royal Society A dedicated to this topic, with multiple valuable articles, of which the following are just a few:
Reproducibility and earth/climate science#
One of the National Academies reports above commissioned a paper by Bush et al. (2020) titled Perspectives on Data Reproducibility and Replicability in Paleoclimate and Climate Science.
Liu et al. 2019, improving reproducibility in Earth science research.
Feulner 2016, Science under Societal Scrutiny: Reproducibility in Climate Science.
Hoffimann et al. 2021, Geostatistical Learning: Challenges and Opportunities
Abernathey et al. 2021, Cloud-Native Repositories for Big Scientific Data.