# STAT-202
****Table of Contents****
> [TOC]
---
## Session 1: January 17, 2023
>**Grading Schedule**
>
>Attendance: 5%
>*Can miss up to 3 classes without penalty, any class missed after 3 is .5 of a letter grade reduced.*
>
>Homework: 15%
>*Both online and class homework, you can work together on in class work/labs.* *Late homework is not accepted unless it is an emergency. **All homework is due Wednesday's at 11:59pm**
>
>Examinations: 20% -- Februrary 8 & March 7
>*One sheet of notes is permissable. Calculators are allowed if you want. Need a laptop for examinations.*
>
>Final: 25% -- May 2, 8:10-10:40
>*Two sheets of paper of notes for the final exam, any font size.*
>
>*Survey for Study Extra Credit is the only available*
>
>4 pt. Grading Scale for each assignment/problem set/examination:
>*4.0-3.7: A*
>*3.7-3.5: A-*
>*3.5-3.3: B+*
>*3.3-2.7: B*
>*2.7-2.5: B-*
>*2.5-2.3: C+*
>**Office Hours**
>In person DMTI 111 Wednesday's 12-2pm.
>Tuesday online time TBD.
## Session 2: January 18, 2023
### Chapter 2: Types of Data
Categorical: Data falls into **distinct** categories.
>Example: for all undergraduates everyone is a freshman, sophmore, junior, senior.
All data is in one category and the categories dont overlap.
Quantitative: A measured number. One that makes sense for doing arithmetic.
>Example: Height of a person.
Identifier: Any data that can either directly identify an individual or link an individual to their identity.
---
**Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age). Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).**
---
### Smartphone Example
What type of variable are these?
1. Brand?
a. *Catagorical*
2. Model
b. *Catagorical*
3. Cost
c. *Quantitative*
4. Area code
d. *Catagorical*
5. Battery Life
e. *Quantitative*
6. Serial Number
f. *Identifier*
### Displaying data with *Categorical Variables*
Bar Charts:

Pie Charts:

Frequency Table:

---
### Displaying data with *Quantitative Data*
Histograms:

## Session 3: January 24, 2024
#### Teacher Example
> Scale overall from 1-7.
>The mean for both is *5.31* and the median is *5*.
### Finishing Chapter 2
Measure of a spread connected to the median: **Inner Quartile Range (IQR)**
**Q3:** The value that if you are above it, you're in the top *25%.*
**Q1:** The value that if you are in bottom *25%.*
>(Minimum to Q1 is exactly 25% of the datapoints. Q1 to median is 25%, Median to Q3 is also 25%, ext.)
Measure of spread connected to the mean: **Standard Deviation**
The idea of st. dev. is that it is the average amount the data is off from the mean by.
## Session 5: January 29, 2024
Association: If you change one categorical variables value you get a significantly larger or smaller percent of datapoints in the value of the other categorical variable.
> Example: Men and Women when it comes to party preference. If you change from men to women, more women often are alligned with the democratic party than men, and vice versa.
Association **=/=** Causation
Lurking Variables are another explaination for the association
### Chapter 4:
Five Number Summary.
>Minimum
>Q1
>Median
>Q3
>Maximum
**In between each consecutive number is 25% of the data.**
---
A Boxplot is a visual representation of the 5 number summary above.

*****Definitions:*****
**Upper Fence:** Q3+1.5*IQR
**Lower Fence:** Q1+1.5*IQR
---

---
## Session 6: February 1, 2024
### Chapter 5: Normal Distructions
Standard Normal Distribution

Is there the same amount of data between 0 and 1 and 1 and 2? No.
We use Z scores to convert real data to this idealized normal distribution N.
Lets say now that you have data value in a normal distribution (x). Mean is m, and Std. Dev is s.

---
Math placement test is out of 30 questions. The Mean is 19.7 and the st dev is 6.2, you need a score of 27 or above to get into Calc I. If the placement test scores are normally distributed what % tests into Calc I.
*Z Score: 1.177*
Therefor given that for a score of 27, roughly 12% of test takers score into Calc I.
---
SAT: Mean 1090 std dev 19.5
ACT Mean 20.9 std dev 5.6
Alex got a 1450 on the SAT and Isabel got a 33 on the ACT. Who did better
**Scatterplots:** are 2 dimensional graphs with points representing a single datapoint
There are **2** quantitative variables with a scatterplot graph that show correlation to one another.
There are a few differences with quant variables:
We will pick one variable to be :explanitory" and one variable to be "response"
Scatterplots can look like the following:

---
Correlation:
> In order to use a correlation, you must check several conditions. Both variables must be quantitative, the form of the scatterplot must be straight enough that a linear relationship makes sense, and there should not be any outliers.
A **lurking variable** is a variable that is hidden or not included in an analysis, but impacts the relationship being analyzed. Some lurking variables hide real relationships, while others can make a false relationship appear to exist. Either way, lurking variables create misleading study results.
A **Confounder** is a variable whose presence affects the variables being studied so that the results do not reflect the actual relationship. There are various ways to exclude or control confounding variables including Randomization, Restriction and Matching.
---
Craps Example
**X = $ made playing the first game.**
I want the expected value of X, and the SD for X: E(X), SD(X) for:
X=
> +10 jf roll 7 or 11
> -10 if roll 2, 3 or 12
> 0 if roll 4, 5, 6, 8, 9 or 10
In this case the formula is:
**Σ xP(X=x)**
(x is possible values for X.)
Questions:
Is rolling a 2 and rolling a 3 disjoint?
Find Σ xP(X=-10)
*P(X=+10)=8/36
P(X=-10)=4/36
P(X=0)=24/36*
Yes they are disjointed, because you cannot add the sum of 2 and 3 in the same roll.
Probability then of -10 is 4/36 which is 1/9.
To find Variance of X:
Var(X)=(-10-(1.11))^2(1/9) + (10-(1.11))^2(2/9) + (0-(1.11))^2(2/3)=**32.1**
And to find Standard Dev:
SD(X)=sqrt32.1=**5.66**
---
Example:
I run an M&M Factory. In my factory I am supposed to produce 20% brown M&M's. For dinner I grab a party pack (**250**) M&M's. I notice a lot of browns and count them out.
Hypothesis testing:
**First**, you must develop a **Null Hypothesis**. Typically it is nothing that is out of the ordinary
Out example: The factory produces 20% brown M&M's.
P=proportion of M&M's in my factory that are brown.
**Second**, Establish an alternative Hypothesis:
Ha:
p>0.2
p<0.2 The bag has more than 30% so not reasonable.
p=/=0.2 Either greater or less than.
**Third,** Check conditions for "inference": For us, proportions
a. Independance/Randominzation
b. 10% condition
c. np, nq > 10
a. Vat of M&M's in bags
b. Factors has millions of M&Ms
c. 75 brown M&M's (sucess), 175 non-brown (failures), > 10
**Forth** set an alpha level. This is where you decide between coincidence and something happening. Common alpha levels are 5%.
**Fifth** Look at your sample. Our example p̂ = 0.3 n=250
I know that p̂ ~ N[p,sqrt(pq/n)]
Using my null hypothesis that is p=0.2
SD = sqrt[((.2)(.8))/250]
Mean = 0.2
What is the z score for 0.2 if the mean is 0.2 and SD is 0.0253

P-value is hte probability I would get my sample (or more extreme) if the null hypothesis was true.
**Sixth** Conclusion.
We can either reject the null hypothesis, or we can fail to reject the null hypothesis.
Determining factor is if pvalue is less than alpha level
Reject null (p=0.2) and accept alternative (p>0.2)
### Chi Square Test
Whats the difference between $\chi^2$ test and a proportion test?
A porportion test has 2 choices: sucess or failure
A $\chi^2$ test can have any number of choices.
Lets do an example.
6 sided dice
Side Rolls
1 163
2 164
3 169
4 157
5 189
6 160
For $\chi^2$ I need two things:
1. An Observed amount (my sample)
2. An expected amount
If you have that– thats all you need.
$\chi^2 = \sum(\frac{(Observed-Expected)^2}{Expected})$
So you would repeat this for each choice:
$\chi^2 = \sum(\frac{(163-166.67)^2}{166.67})$ + $\chi^2 = \sum(\frac{(164-166.67)^2}{166.67})$... Repeat till its all done to get $\chi^2$
Then plug it in statcrunch:

x