--- title: Probability and Statistics --- # Basic samples n = 17 Our sample is the measurements of heights of 17 classmates ``` Sample = { 160, 163, 170, 191, 187, 190, 179, 183, 158, 168, 185, 186, 195, 172, 160, 174, 173 } Sorted sample = { 158, 160, 160, 163, 168, 170, 172, 173, 174, 179, 183, 185, 186, 187, 190, 191, 195 } Count : { 158 : 1, 160 : 2, 163 : 1, ..., 195 : 1} ``` Tabular representation of counts - frequency distribution: | Height |158|160| 163 | ... | |---|---|---| --- | ---| |Counts | 1 | 2 | 1 | ... | **Mode** is the most frequent observation, 160 in our case. **Histogram** : frequency table, where we count not the exact matches, but the number of observations falling in an interval: | Height |150< x < 160|161<x <170| 171 < x < 180 | 181 <x < 190 | x > 191 | |---|---|---| --- | ---|---| |Counts | 3 | 3 | 4 | 5 | 2| Modal interval : $[181, 190]$ Mode (from the histogram) is the middle of the modal interval : $(181 + 190)/2 = 185.5$ Deviation = $xi - ̄x$ ```= { - 18.1, -16.1, -16.1, -13.1, -8.1, -6.1, -4.1, -3.1, -2.1, 2.9, 6.9, 8.9, 9.9, 10.9, 13.9, 14.9, 18.9} ``` Average deviation = ((x1 - ̄x) + (x2 - ̄x) + ... + (x17 - ̄x))/17 = = (x1 + x2 + ... + x17 - ̄x - ̄x - ... - ̄x )/17 = = (x1 + x2 + ... + x17)/17 - ( ̄x + ̄x + ... + ̄x )/17 = = ̄x - 17 * ̄x / 17 = ̄ = ̄x - ̄x = 0. Variance = σ² = ((x1 - ̄x)² + ... + (x17 - ̄x)²) / 17 (18.1**2 + 16.1**2 + 16.1**2 + 13.1**2 + 8.1**2 + 6.1**2 + 4.1**2 + 3.1**2 + 2.1**2 + 2.9**2 + 6.9**2 + 8.9**2 + 9.9**2 + 10.9**2 + 13.9**2 + 14.9**2 + 18.9**2)/17 = 2275.77/17 = 133.86882352941177 Standard deviation = σ = √ Variance = 133.86882352941177**.5 = 11.570169554911967 ### Measures of centrality: - **Sample Mean, Sample Average** = ̄$x = (x1 + ... x17)/17 = (158 + ... + 195 )/17 = 2994/17 = 176.11764705882354$ - **Median** = 9th student with the height 174cm - **Mode** = 160 because we have 2 observations of this height, whereas all others are unique. - Mode can be computed from the histogram as the middle of the modal interval ### Measures of variation - **Variance** ``` ((x1 - ̄x)² + ... + (x17 - ̄x)²) / 17 (18.1**2 + 16.1**2 + 16.1**2 + 13.1**2 + 8.1**2 + 6.1**2 + 4.1**2 + 3.1**2 + 2.1**2 + 2.9**2 + 6.9**2 + 8.9**2 + 9.9**2 + 10.9**2 + 13.9**2 + 14.9**2 + 18.9**2)/17 = 2275.77/17 = 133.86882352941177 ``` - **Standard deviation** = sq root of Variance ``` 133.86882352941177**.5 = 11.570169554911967 ``` - **Range** = Max observation - min observation = 195 - 158 = 37 - **Interquartile range** = 3rd quartile - 1st quartile = 185.5 - 165.5 = 20. - **Coefficient of variation** = standard deviation / sample mean = 11.6 / 176.1 = .066 or 6.6%. ### Measures of distribution shape 'Other measures of location' - (**Median** is the 2nd quartile) - **1st quartile** - such oibservation, that one quarter of other observations have a smaller measurement, and 3/4 have a larger measurement - In our sample the 1st quartile will be higher than 4.25 persons, and smaller than 12.75 persons, the average of heights of 4th and 5th person : 163+168/2 = 165.5 - If we had 21 persons in our sample : - the median person has number 11 - the 1 quartile person has number between 5th and 6th - the 3 quartile person has number between 16 and 17. - the 3rd quartile : 186+187/2 = 185.5 - **q - percentile** - q% of your observation have the measurement smaller than this number. - 25% - percentile = 1st quartile - 75% - percetile = 3st quartile - 50% -percentile = median - 5% - percentile of our sample has $17*.05 = 17/20 = .85 \approx 1$ person, i.e. 5% -percentile is between 158 and 160, i.e. (158+160)/2 = 159. --- ## Exercise 1 Size of the sample = 20 |Intevals |0 - 4|5 - 9|10 - 14|15 - 19|20 - ...| |----|----|----|----|----|----| |Frequency distribution| 3 | 5 | 8 | 3 | 1 | |Relative frequency distribution = Frequency / N| .15 | .25 | .4 | .15 | .05 | |Cumulative frequency distribution| 3 | 8 | 16 | 19 | 20 | |Cumulative relative frequency distribution| .15 | .4 | .8 | .95 | 1. | 1. Plot axes x (values), y 2. On axes x plot interval bounds : (0, 5, 10,15, 20, 25) 1st quartile, Q1 = 5.5 - percentile index is 1/4 * 20 = 5, therefore Q1 is the average of observations 5 and 6. Median - Q2 = 10 - percentile index is 1/2 * 20 = 10, therefore Q2 3rd quartile - Q3 = 13.5 Interquartile range = Q3 - Q1 = 13.5 - 5.5 = 8 Mode = 10 and 13 - from the raw sample - based on the histogram, the modal interval is [10,15) and the mode is the midpoint of this interval, 12.5. Mean = 10.25 Variance = $$ \sigma^2 = (9.25^2 + 7.25^2 + 6.25^2 + 2 * 5.25^2 + 4.25^2 + 2 * 2.25^2 + 3 * .25^2 +1.75^2 \\+ 3 * 2.75^2 + 3.75^2 + 4.75^2 + 5.75^2 + 8.75^2 + 9.75^2)/20 = 527.75/20 = 26.3875$$ Standard deviation = $\sqrt{\sigma^2} =\sqrt{527.75/20} = 5.1369$ --- ### Exercise 2 Sample size = 14 0.55 0.57 0.57 0.68 0.72 0.77 0.86 0.90 0.92 0.94 1.14 1.41 1.42 1.51 Median = (.86 + .90)/2 = .88 Mode = .57 Q1 = .68 - percentile index for Q1 is 1/4 * 14 = 3.5. Rounding up gives us 4, the number of the observation that is Q1 Q3 = 1.14 - percentile index for Q4 3/4 * 14 = 10.5. Rounding up we get 11, the position of our Q3, 1.14. IQR = Q3 - Q1 = 1.14 - .68 = .46 Mean = (.55 + 2 * .57 + .68 + .72 + ...) = .93 --- # Random events - empty set $\emptyset$ - whole space of all possible random events $\Omega$ ### Example : dice Random events are $\{"1", "2", \dots , "6", \emptyset, \Omega\}$. ## Random events operations: 1. "addition", **union** $A \cup B$ - "either event A OR event B had place" "The dice toss resulted in a number < 3" = "1" OR "2" "The double dice toss resulted in number <=3" = "(1,1)" OR "(1,2)" OR "(2,1)" - $\Omega$ is the union of all possible elementary events Dice : $\Omega = 1 \cup 2 \cup 3 \cup 4 \cup 5 \cup 6$ 2. "multiplication", **intersection** $A \cap B$ - "both event A AND event B had place" "The dice has resulted in an odd number" AND "The dice number is < 3" = "1" "The dice has resulted in an odd number" = "1" OR "3" OR "5" "The dice number is < 3" = "1" OR "2" $$(1 \cup 3 \cup 5) \cap (1 \cup 2) = 1$$ - empty set $\emptyset$ is the result of intersection of two **inconsistent events**: "1" AND "2" = $\emptyset$ - "subtraction of events" $A\setminus B$, "Event A did happen but event B did not". "The dice number is odd" \ "the dice number is $\leq$ 4" = $(1\cup 3\cup 5) \setminus (1\cup 2\cup 3\cup4)$ = "5" - complentary event : $A^c = \Omega \setminus A$ "it is not event A that happened" "The dice number is odd" = (1 OR 3 OR 5) "The dice number is odd"$^c$ = (2, 4 ,6) = "The dice number is even". "The dice number is $\leq$ 3"$^c$ = "The dice number is either 4 or 5 or 6" = "$\geq 4$". $$P(A) + P(B) - P(A \cap B) = P(A \cup B)$$ In the Venn diagram exmple, the probability of an event is proportional to the area of the shape corresponding to this event. If we add the areas of the events A and B we count their intersection é times, and we have to subtract it once to get to the probability of a union of 2 events. ### Example: Tutorial 2, PB1 $\Omega = \{E_1, E_2, E_3, E_4, E_5\}$ $A = \{E_1, E_2\}$ $B = \{E_3,E_4\}$ $C = \{E_2, E_3, E_5\}$ $P(\emptyset) =0, P(\Omega) = 1$ Elementary events never intersect and union of all elentary events form the sample space. $P(\Omega) = P(E_1\cup E_2 \cup \dots \cup E_5) = P(E_1) + P(E_2) + \dots + P(E_5) = 1$ $P(E_1) = \dots = P(E_5) = \frac{1}{5}$ a. $P(A) = P(E_1) + P(E_2) = \frac{2}{5}$ $P(B) = \frac{2}{5}$ $P(C) = \frac{3}{5}$ b. $P(A\cup B) = \frac{2}{5} + \frac{2}{5} - P(A \cap B) = \frac{4}{5}$ $P(A \cap B) = P((E_1 \cup E_2) \cap (E_3 \cup E_4)) = P(\emptyset) =0$ c. $A^c = \{E_3,E_4,E_5\}$, $P(A^c) =\frac{3}{5}$ $C^c = \{E_1, E_4\}$, $P(C^c) = \frac{2}{5}$ ### Example : Tutorial 2, PB2 $P(A) = .5, P(B) = .6, P(A\cap B) = .4$ $P(A\mid B) = \frac{P(A\cap B)}{P(B)} = \frac{.4}{.6} = 2/3$ $P(B\mid A) = \frac{P(A\cap B)}{P(A)} = \frac{.4}{.5} = .8$ A and B are independent, if $P(A |B) = P(A)$ ### Example, Totorial 2, PB3 A = "wears a Costume", therefore $A^c$ = "no costume" B = "Female", and $B^c$ = "Male" $P(A) = .75, P(A^c) = .25$ $P(\text{"Male _given that_ there's no costume"}) = P(B \mid A^c) =3/5$ $P (B) = 12/20 = 3/5$ ||Female $B$| Male $B^c$| |---|---|---| |Costume, $A$|$p(B\cap A)$| $P(A\cap B^c)$| |No costume, $A^c$|$P(B \cap A^c)$|$P(A^c\cap B^c)$| > $$P(A|B) = \frac{P(A\cap B)}{P(B)}$$ $P(B| A^c) = 3/5 = \frac{P(B\cap A^c)}{P(A^c)}$ $P(B \cap A^c) =\frac{3}{5}\times P(A^c) = \frac{3}{5} \times \frac{5}{20} = .15$ $P(B) = P(B \cap A) + P(B \cap A^c)$ $P(B\cap A) = P(B) - P(B\cap A^c) = 12/20 - .15 = .6 - .15 = .45$ $P(A) = P(A\cap B) + P(A \cap B^c)$ $P(A\cap B^c) = P(A) - P(A\cap B) = 15/20 - .45 = .75 - .45 = .3$ $P(A^c \cap B^c) = 1 - P(A\cap B) - P(A\cap B^c) - P(A^c \cap B^c) = 1 - .45 - .3 - .15 = .1$