---
title: Probability and Statistics
---
# Basic samples
n = 17
Our sample is the measurements of heights of 17 classmates
```
Sample = { 160, 163, 170, 191, 187, 190,
179, 183, 158, 168, 185, 186,
195, 172, 160, 174, 173 }
Sorted sample = { 158, 160, 160, 163, 168, 170,
172, 173, 174, 179, 183, 185,
186, 187, 190, 191, 195 }
Count : { 158 : 1,
160 : 2,
163 : 1,
...,
195 : 1}
```
Tabular representation of counts - frequency distribution:
| Height |158|160| 163 | ... |
|---|---|---| --- | ---|
|Counts | 1 | 2 | 1 | ... |
**Mode** is the most frequent observation, 160 in our case.
**Histogram** : frequency table, where we count not the exact matches, but the number of observations falling in an interval:
| Height |150< x < 160|161<x <170| 171 < x < 180 | 181 <x < 190 | x > 191 |
|---|---|---| --- | ---|---|
|Counts | 3 | 3 | 4 | 5 | 2|
Modal interval : $[181, 190]$
Mode (from the histogram) is the middle of the modal interval : $(181 + 190)/2 = 185.5$
Deviation = $xi - ̄x$
```= { - 18.1, -16.1, -16.1, -13.1, -8.1, -6.1,
-4.1, -3.1, -2.1, 2.9, 6.9, 8.9,
9.9, 10.9, 13.9, 14.9, 18.9}
```
Average deviation = ((x1 - ̄x) + (x2 - ̄x) + ... + (x17 - ̄x))/17 =
= (x1 + x2 + ... + x17 - ̄x - ̄x - ... - ̄x )/17 =
= (x1 + x2 + ... + x17)/17 - ( ̄x + ̄x + ... + ̄x )/17 =
= ̄x - 17 * ̄x / 17 = ̄
= ̄x - ̄x = 0.
Variance = σ² = ((x1 - ̄x)² + ... + (x17 - ̄x)²) / 17
(18.1**2 + 16.1**2 + 16.1**2 + 13.1**2 + 8.1**2 + 6.1**2 + 4.1**2 + 3.1**2 + 2.1**2 + 2.9**2 + 6.9**2 + 8.9**2 + 9.9**2 + 10.9**2 + 13.9**2 + 14.9**2 + 18.9**2)/17 = 2275.77/17 = 133.86882352941177
Standard deviation = σ = √ Variance = 133.86882352941177**.5 = 11.570169554911967
### Measures of centrality:
- **Sample Mean, Sample Average** = ̄$x = (x1 + ... x17)/17 = (158 + ... + 195 )/17 = 2994/17 = 176.11764705882354$
- **Median** = 9th student with the height 174cm
- **Mode** = 160 because we have 2 observations of this height, whereas all others are unique.
- Mode can be computed from the histogram as the middle of the modal interval
### Measures of variation
- **Variance**
```
((x1 - ̄x)² + ... + (x17 - ̄x)²) / 17
(18.1**2 + 16.1**2 + 16.1**2 + 13.1**2 + 8.1**2 + 6.1**2 + 4.1**2 + 3.1**2 + 2.1**2 + 2.9**2 + 6.9**2 + 8.9**2 + 9.9**2 + 10.9**2 + 13.9**2 + 14.9**2 + 18.9**2)/17 = 2275.77/17 = 133.86882352941177
```
- **Standard deviation** = sq root of Variance
```
133.86882352941177**.5 = 11.570169554911967
```
- **Range** = Max observation - min observation = 195 - 158 = 37
- **Interquartile range** = 3rd quartile - 1st quartile = 185.5 - 165.5 = 20.
- **Coefficient of variation** = standard deviation / sample mean = 11.6 / 176.1 = .066 or 6.6%.
### Measures of distribution shape 'Other measures of location'
- (**Median** is the 2nd quartile)
- **1st quartile** - such oibservation, that one quarter of other observations have a smaller measurement, and 3/4 have a larger measurement
- In our sample the 1st quartile will be higher than 4.25 persons, and smaller than 12.75 persons, the average of heights of 4th and 5th person : 163+168/2 = 165.5
- If we had 21 persons in our sample :
- the median person has number 11
- the 1 quartile person has number between 5th and 6th
- the 3 quartile person has number between 16 and 17.
- the 3rd quartile : 186+187/2 = 185.5
- **q - percentile** - q% of your observation have the measurement smaller than this number.
- 25% - percentile = 1st quartile
- 75% - percetile = 3st quartile
- 50% -percentile = median
- 5% - percentile of our sample has $17*.05 = 17/20 = .85 \approx 1$ person, i.e. 5% -percentile is between 158 and 160, i.e. (158+160)/2 = 159.
---
## Exercise 1
Size of the sample = 20
|Intevals |0 - 4|5 - 9|10 - 14|15 - 19|20 - ...|
|----|----|----|----|----|----|
|Frequency distribution| 3 | 5 | 8 | 3 | 1 |
|Relative frequency distribution = Frequency / N| .15 | .25 | .4 | .15 | .05 |
|Cumulative frequency distribution| 3 | 8 | 16 | 19 | 20 |
|Cumulative relative frequency distribution| .15 | .4 | .8 | .95 | 1. |
1. Plot axes x (values), y
2. On axes x plot interval bounds : (0, 5, 10,15, 20, 25)
1st quartile, Q1 = 5.5
- percentile index is 1/4 * 20 = 5, therefore Q1 is the average of observations 5 and 6.
Median - Q2 = 10
- percentile index is 1/2 * 20 = 10, therefore Q2
3rd quartile - Q3 = 13.5
Interquartile range = Q3 - Q1 = 13.5 - 5.5 = 8
Mode = 10 and 13 - from the raw sample
- based on the histogram, the modal interval is [10,15) and the mode is the midpoint of this interval, 12.5.
Mean = 10.25
Variance =
$$ \sigma^2 = (9.25^2 + 7.25^2 + 6.25^2 + 2 * 5.25^2 + 4.25^2 + 2 * 2.25^2 + 3 * .25^2 +1.75^2 \\+ 3 * 2.75^2 + 3.75^2 + 4.75^2 + 5.75^2 + 8.75^2 + 9.75^2)/20 = 527.75/20 = 26.3875$$
Standard deviation = $\sqrt{\sigma^2} =\sqrt{527.75/20} = 5.1369$
---
### Exercise 2
Sample size = 14
0.55 0.57 0.57 0.68 0.72 0.77 0.86 0.90 0.92 0.94 1.14 1.41 1.42 1.51
Median = (.86 + .90)/2 = .88
Mode = .57
Q1 = .68
- percentile index for Q1 is 1/4 * 14 = 3.5. Rounding up gives us 4, the number of the observation that is Q1
Q3 = 1.14
- percentile index for Q4 3/4 * 14 = 10.5. Rounding up we get 11, the position of our Q3, 1.14.
IQR = Q3 - Q1 = 1.14 - .68 = .46
Mean = (.55 + 2 * .57 + .68 + .72 + ...) = .93
---
# Random events
- empty set $\emptyset$
- whole space of all possible random events $\Omega$
### Example : dice
Random events are $\{"1", "2", \dots , "6", \emptyset, \Omega\}$.
## Random events operations:
1. "addition", **union** $A \cup B$ - "either event A OR event B had place"
"The dice toss resulted in a number < 3" = "1" OR "2"
"The double dice toss resulted in number <=3" = "(1,1)" OR "(1,2)" OR "(2,1)"
- $\Omega$ is the union of all possible elementary events
Dice : $\Omega = 1 \cup 2 \cup 3 \cup 4 \cup 5 \cup 6$
2. "multiplication", **intersection** $A \cap B$ - "both event A AND event B had place"
"The dice has resulted in an odd number" AND "The dice number is < 3" = "1"
"The dice has resulted in an odd number" = "1" OR "3" OR "5"
"The dice number is < 3" = "1" OR "2"
$$(1 \cup 3 \cup 5) \cap (1 \cup 2) = 1$$
- empty set $\emptyset$ is the result of intersection of two **inconsistent events**:
"1" AND "2" = $\emptyset$
- "subtraction of events" $A\setminus B$, "Event A did happen but event B did not".
"The dice number is odd" \ "the dice number is $\leq$ 4" = $(1\cup 3\cup 5) \setminus (1\cup 2\cup 3\cup4)$ = "5"
- complentary event : $A^c = \Omega \setminus A$ "it is not event A that happened"
"The dice number is odd" = (1 OR 3 OR 5)
"The dice number is odd"$^c$ = (2, 4 ,6) = "The dice number is even".
"The dice number is $\leq$ 3"$^c$ = "The dice number is either 4 or 5 or 6" = "$\geq 4$".
$$P(A) + P(B) - P(A \cap B) = P(A \cup B)$$
In the Venn diagram exmple, the probability of an event is proportional to the area of the shape corresponding to this event. If we add the areas of the events A and B we count their intersection é times, and we have to subtract it once to get to the probability of a union of 2 events.
### Example: Tutorial 2, PB1
$\Omega = \{E_1, E_2, E_3, E_4, E_5\}$
$A = \{E_1, E_2\}$
$B = \{E_3,E_4\}$
$C = \{E_2, E_3, E_5\}$
$P(\emptyset) =0, P(\Omega) = 1$
Elementary events never intersect and union of all elentary events form the sample space.
$P(\Omega) = P(E_1\cup E_2 \cup \dots \cup E_5) = P(E_1) + P(E_2) + \dots + P(E_5) = 1$
$P(E_1) = \dots = P(E_5) = \frac{1}{5}$
a. $P(A) = P(E_1) + P(E_2) = \frac{2}{5}$
$P(B) = \frac{2}{5}$
$P(C) = \frac{3}{5}$
b. $P(A\cup B) = \frac{2}{5} + \frac{2}{5} - P(A \cap B) = \frac{4}{5}$
$P(A \cap B) = P((E_1 \cup E_2) \cap (E_3 \cup E_4)) = P(\emptyset) =0$
c. $A^c = \{E_3,E_4,E_5\}$, $P(A^c) =\frac{3}{5}$
$C^c = \{E_1, E_4\}$, $P(C^c) = \frac{2}{5}$
### Example : Tutorial 2, PB2
$P(A) = .5, P(B) = .6, P(A\cap B) = .4$
$P(A\mid B) = \frac{P(A\cap B)}{P(B)} = \frac{.4}{.6} = 2/3$
$P(B\mid A) = \frac{P(A\cap B)}{P(A)} = \frac{.4}{.5} = .8$
A and B are independent, if $P(A |B) = P(A)$
### Example, Totorial 2, PB3
A = "wears a Costume", therefore $A^c$ = "no costume"
B = "Female", and $B^c$ = "Male"
$P(A) = .75, P(A^c) = .25$
$P(\text{"Male _given that_ there's no costume"}) = P(B \mid A^c) =3/5$
$P (B) = 12/20 = 3/5$
||Female $B$| Male $B^c$|
|---|---|---|
|Costume, $A$|$p(B\cap A)$| $P(A\cap B^c)$|
|No costume, $A^c$|$P(B \cap A^c)$|$P(A^c\cap B^c)$|
> $$P(A|B) = \frac{P(A\cap B)}{P(B)}$$
$P(B| A^c) = 3/5 = \frac{P(B\cap A^c)}{P(A^c)}$
$P(B \cap A^c) =\frac{3}{5}\times P(A^c) = \frac{3}{5} \times \frac{5}{20} = .15$
$P(B) = P(B \cap A) + P(B \cap A^c)$
$P(B\cap A) = P(B) - P(B\cap A^c) = 12/20 - .15 = .6 - .15 = .45$
$P(A) = P(A\cap B) + P(A \cap B^c)$
$P(A\cap B^c) = P(A) - P(A\cap B) = 15/20 - .45 = .75 - .45 = .3$
$P(A^c \cap B^c) = 1 - P(A\cap B) - P(A\cap B^c) - P(A^c \cap B^c) = 1 - .45 - .3 - .15 = .1$