DS4PS

The future central hub for sharing pedagogical resources for data science in public affairs courses will (soon) be ds4ps.org (Data Science for Public Service).

Why data science in the public sector?

Open source textbooks

  • Chester Ismay and Albert Y. Kim, ModernDive: An Introduction to Statistical and Data Sciences via R, https://moderndive.com/: A complete introductory statistics course based on R and focused on practical data work and simulation-based hypothesis testing. I taught an executive MPA course with this book in 2018, and plan on using it in intro MPA/MPP stats classes in the future.
  • Hadley Wickham and Garrett Grolemund, R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (Sebastopol, California: O’Reilly Media, 2017), http://r4ds.had.co.nz/: The standard text for learning modern data science with R.
  • David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel, OpenIntro Statistics (4th edition, 2019), https://www.openintro.org/stat/textbook.php?stat_book=os: A free textbook on general statistics. I’ve used parts of this in my classes and it’s great.
  • Kieran Healy, Data Visualization: A Practical Introduction (Princeton: Princeton University Press, 2018), http://socviz.co/: I taught an MPA class on data visualization with this book (and Claus Wilke’s) in 2018, and it was a phenomenal resource.
  • Claus E. Wilke, Fundamentals of Data Visualization (Sebastopol, California: O’Reilly Media, 2018), https://serialmentor.com/dataviz/.
  • Paul J. Gertler et al., Impact Evaluation in Practice, 2nd ed. (Inter-American Development Bank; World Bank, 2016), https://openknowledge.worldbank.org/handle/10986/25030: This is a free textbook on program evaluation published by the World Bank, and it is very well written and quite accessible for beginners. It’s not R specific, but can be taught in an R-focused class for the different types of econometrics approaches. I use this in my course on program evaluation.
  • Scott Cunningham, Causal Inference: The Mixtape, 2018, https://www.scunning.com/mixtape.html: This is a free econometrics textbook with example code in Stata. The 2nd edition of the book, currently under development, will have example code in both R and Stata. This is a much more advanced econometrics book and isn’t great for students with little math background (i.e. there are tons of equations!), but it’s still full of good examples of econometrics models.
  • Julia Silge and David Robinson, Text Mining with R: A Tidy Approach, 2017, https://www.tidytextmining.com/: A complete textbook on how to do text analysis with R
  • Robin Lovelace, Jakub Nowosad, Jannes Muenchow, Geocomputation with R, 2019, https://geocompr.robinlovelace.net/: A complete textbook on how to do GIS analysis with R
  • Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter and Julia Lane, Big Data in Social Science, 2019, https://coleridge-initiative.github.io/big-data-and-social-science/: A complete textbook on general principles of computational social science

RStudio resources

  • RStudio Education: RStudio has an entire division dedicated to education and training, with a host of cheat sheets, interactive primers, and other resources
  • RStudio.cloud: You can run RStudio in your browser for free. Even better, you can set up a shared class workspace and create projects and assignments that students can complete in their own browsers, also for free. There’s no need to have them install R locally on their computer (at least at first; I have studnets use RStudio Cloud for the first few weeks and then transition them to running it locally during the semester)

Data

  • Google Dataset Search: Google indexes thousands of public datasets; search for them here
  • Kaggle: Kaggle hosts machine learning competitions where people compete to create the fastest, most efficient, most predictive algorithms. A byproduct of these competitions is a host of fascinating datasets that are generally free and open to the public. See, for example, the European Soccer Database, the Salem Witchcraft Dataset or results from an Oreo flavors taste test.
  • 360Giving: Dozens of British foundations follow a standard file format for sharing grant data and have made that data available online.
  • US City Open Data Census: More than 100 US cities have committed to sharing dozens of types of data, including data about crime, budgets, campaign finance, lobbying, transit, and zoning. This site from the Sunlight Foundation and Code for America collects this data and rates cities by how well they’re doing.
  • Political science and economics datasets: There’s a wealth of data available for political science- and economics-related topics: