# Introduction to computing for the social sciences
## Learning objectives
* Introduce myself
* Identify major course objectives
* Identify course logistics
* Introduce basic principles of data science workflow and programming
* Explain how to get started in R
___
**Monday, April 6, 2020**
## General Notes
* Everything will be available on the [course website](https://cfss.uchicago.edu/)!
* The course mainly focuses on learning **basic programmatic and computational skills**
* Stick to the **15 minute rule** when trying to problem solve
* Otherwise, **ask for help** using:
* The "Issues" page on Github
* Office hours! (TAs and Professor Soltoff)
* Additional Resources
* ["How to properly ask for help"](https://cfss.uchicago.edu/faq/asking-questions/) on the course website
* Grades in this class are **based on evaluations** on weekly programming assignments
* Any adjustments that need to be made will happen **at the end of the quarter**
* If you want to take the class P/F, email Professor Soltoff before the last week of the class and check-in around week 5 or 6
* **Homework due every Monday at 11:59pm CST**
## Lecture Notes
### Computational Social Science Workflow
* **Importation Stage**
* Have data in some sort of format and need to prepare it for analysis
* **Tidy**
* Get data into a usable format, generally a **data frame**
* Will spend a lot of time at this stage!
* **Middle Stages (Stastical Analysis): Transform, Visualize, Model**
* Cyclical process
* As you go through the process, you will likely learn something new and go through the process several times until you arrive at an end product
* **Communicate**
* Some sort of end-product like a report or a visualization
*We will be learning how to go through all these steps using programming*
___
### Program
* A series of instructions that specifies how to perform a computation
* **Includes:** Input, Output, Math, Conditional execution, Repetition
___
### GUI (Graphical User Interface)
* Want to think of programming as an **explicit activity** -> actually writing it out
* A little more difficult becuase we have to **remember certain syntax**
**Example: "Jane: a GUI workflow"**
1. Search for data files online
2. Cleans the files in Excel
3. Analyzes the data in Stata
4. Writes her report in Google docs
**Example:"Sally: a programmatic workflow"**
1. Creates a folder specifically for this project
`data`
`graphics`
`output`
2. Search for data files online
3. Cleans the files in R
4. Analyzes the files in R
5. Writes her report in R Markdown
*All of Sally's work is done using code!*
*But there's an issue for Jane when asked to re-do the analysis...*
* Janes does not have the original data, nor does she remember the steps taken to analyze her data
* Sally will benefit because she has a written record of what she's done -> automation
* Much easier to implement *in the long run*
___
### Reproducibility
* Are my results valid? Can it be replicated?
* The idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them
___
### Version control
* Revisions in research
* Tracking revisions
* Multiple copies
* `analysis-1.r`
* `analysis-2.r`
* `analysis-3.r`
* *This is not an optimal system for a number of reasons*
* Cloud storage (e.g. Dropbox, Google Drive, Box)
* *Slight improvement, but its still difficult to track changes as well as keep track of who made what changes when collaborating, among other reasons*
* Version control software
* Repository
* Git
* Can work on a single computer or as a network
* There's an explicit record of who's making changes
___
### Documentation
* *Comments* are the what
* let others know what is happening in the code
* *Code* is the how
* Computer code should also be *self-documenting*
* Future-proofing
* Several weeks or months later, you should still know how that code works
* Likewise, if you're sharing code with colleagues, they should be able to understand what's happening as well
---
### Software
* [Installing software](https://cfss.uchicago.edu/setup/)
* **RStudio Server**
* Need to use the [University's VPN](https://uchicago.service-now.com/it?id=kb_article&kb=KB00015292) in order to access the server
* Local installation