# WEEK 4 (7-11/12/20) ## Inferential Statistics (MON, 7/12/20) ### Probability Distribution ==**Events**== * An outcome/collection of outcomes of a ramdom experiment * Independent vs Dependent events ==**Probability Distribution**== indicate the likelihood of * Binominial distribution * Normal distribution: bell curve, symmetric about the mean * Perfect Normal distribution (mean=mode=median) ==**Central Limit Theorem**== CLT: Population has the ==parameters==: Sample has the ==statistics==: mean and std based on the formula ### Hypothesis testing Test the results of a survey/experiment to see if the results are meaningful **==Steps in Hypothesis Testing==** * **Null hypothesis** (H0): hypothesis that no change happens * **Alternative hypothesis** (H1): opposite of H0 (what we would like to prove) ==Significant level==: the prob of rejecting the null hypothesis when it is true, the threshold of how many type 1 error is allowed ==p-value==: if h(0) is true, what is the probability of obtaining the observed statistics or the alternative hypothesis p<= alpha: can reject the null hypothesis **Statistic tests**: * T-test:determine if there is a significant difference between the means of the two groups * Z-test: validate a hypothesis that the samples drawn belongs to the same population * Chi-square, ANOVA,... ## Advanced Pandas - Geo Visualization (TUE, 8/12/20) ## Time Series Data in Pandas (WED, 9/12/20) * ==Time stamps==: exact time/particular moments in time * Time intervals/periods: length of time, uniform, does not overlap with each other * Time deltas/durations: difference between 2 moments in time ``` np.arange(10) OUTPUT array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) ``` ``` adding_time2=pd.to_timedelta(np.arange(10),'D') date+adding_time2 OUTPUT DatetimeIndex(['2015-07-05', '2015-07-06', '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10', '2015-07-11', '2015-07-12', '2015-07-13', '2015-07-14'], dtype='datetime64[ns]', freq=None) ``` ==**Data Structure**== * For time stamps: Timestamp type. The associated Index structure is DatetimeIndex. For time periods, Pandas provides the Period type. The associated index structure is PeriodIndex. For time deltas or durations, Pandas provides the Timedelta type. The associated index structure is TimedeltaIndex. ==**Resampling**== * Upsampling: Where you increase the frequency of the samples, from minutes to seconds. * Downsampling: Where you decrease the frequency of the samples, such as from days to months. ## StreamLit (THU, 10/12/20) ## Module Test (FRI, 11/12/20)