Introduction to computing for the social sciences

# Introduction to computing for the social sciences ## Learning objectives * Introduce myself * Identify major course objectives * Identify course logistics * Introduce basic principles of data science workflow and programming * Explain how to get started in R ___ **Monday, April 6, 2020** ## General Notes * Everything will be available on the [course website](https://cfss.uchicago.edu/)! * The course mainly focuses on learning **basic programmatic and computational skills** * Stick to the **15 minute rule** when trying to problem solve * Otherwise, **ask for help** using: * The "Issues" page on Github * Office hours! (TAs and Professor Soltoff) * Additional Resources * ["How to properly ask for help"](https://cfss.uchicago.edu/faq/asking-questions/) on the course website * Grades in this class are **based on evaluations** on weekly programming assignments * Any adjustments that need to be made will happen **at the end of the quarter** * If you want to take the class P/F, email Professor Soltoff before the last week of the class and check-in around week 5 or 6 * **Homework due every Monday at 11:59pm CST** ## Lecture Notes ### Computational Social Science Workflow * **Importation Stage** * Have data in some sort of format and need to prepare it for analysis * **Tidy** * Get data into a usable format, generally a **data frame** * Will spend a lot of time at this stage! * **Middle Stages (Stastical Analysis): Transform, Visualize, Model** * Cyclical process * As you go through the process, you will likely learn something new and go through the process several times until you arrive at an end product * **Communicate** * Some sort of end-product like a report or a visualization *We will be learning how to go through all these steps using programming* ___ ### Program * A series of instructions that specifies how to perform a computation * **Includes:** Input, Output, Math, Conditional execution, Repetition ___ ### GUI (Graphical User Interface) * Want to think of programming as an **explicit activity** -> actually writing it out * A little more difficult becuase we have to **remember certain syntax** **Example: "Jane: a GUI workflow"** 1. Search for data files online 2. Cleans the files in Excel 3. Analyzes the data in Stata 4. Writes her report in Google docs **Example:"Sally: a programmatic workflow"** 1. Creates a folder specifically for this project `data` `graphics` `output` 2. Search for data files online 3. Cleans the files in R 4. Analyzes the files in R 5. Writes her report in R Markdown *All of Sally's work is done using code!* *But there's an issue for Jane when asked to re-do the analysis...* * Janes does not have the original data, nor does she remember the steps taken to analyze her data * Sally will benefit because she has a written record of what she's done -> automation * Much easier to implement *in the long run* ___ ### Reproducibility * Are my results valid? Can it be replicated? * The idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them ___ ### Version control * Revisions in research * Tracking revisions * Multiple copies * `analysis-1.r` * `analysis-2.r` * `analysis-3.r` * *This is not an optimal system for a number of reasons* * Cloud storage (e.g. Dropbox, Google Drive, Box) * *Slight improvement, but its still difficult to track changes as well as keep track of who made what changes when collaborating, among other reasons* * Version control software * Repository * Git * Can work on a single computer or as a network * There's an explicit record of who's making changes ___ ### Documentation * *Comments* are the what * let others know what is happening in the code * *Code* is the how * Computer code should also be *self-documenting* * Future-proofing * Several weeks or months later, you should still know how that code works * Likewise, if you're sharing code with colleagues, they should be able to understand what's happening as well --- ### Software * [Installing software](https://cfss.uchicago.edu/setup/) * **RStudio Server** * Need to use the [University's VPN](https://uchicago.service-now.com/it?id=kb_article&kb=KB00015292) in order to access the server * Local installation