# 統計學 Statistics(MGT-2) :::success 授課教師: 曾意儒 實體教室: I-204 [Statistics for Business & Economics Ebook](https://libgen.is/book/index.php?md5=28A10E70A2552E71124D881045CD19EE) ::: :::info :::spoiler Click to Open TOC [TOC] ::: ## Chapter 1 Introduction :::info :::spoiler Learning Objectives - [x] **Descriptive** and **Inferential statistics** - [x] **Language of statistics** and **Key elements of statistisc** - [x] **Population** and **Sample data** - [x] **Types of data** and **Data-collection methods** ::: ### 【Statistical Methods 統計學方法】 #### 【Descriptive Statistics 敘述性統計】 `def:Utilizes numerical and graphical methods to explore data` :::spoiler `Example` ![](https://i.imgur.com/FQAnXVb.png) ![](https://i.imgur.com/zcOUwQY.png) ::: :::spoiler `Four elements of Descriptive Statistics` 1. 我們所感興趣的實驗單位的母體,或是樣本 2. 一個,或是多個我們要調查的變數 3. 能夠拿來做個總結的工具,比如說某個計算結果、圖 (graph) 或是表 (table) 4. 辨認出數據中蘊藏的趨勢 ::: #### 【Inferential Statistics 推論性統計】 `def:Utilizes sample data to make estimates, decisions, predictions, or other generalizations about a larger set of data.` :::spoiler `Example` >Using 1,945,071 real-time PCR results from nose and throat swabs taken from 383,812 participants between 2020/12 and 2021/5 > >Vaccination with the ChAdOx1 or BNT162b2 vaccines already reduced SARS-CoV-2 infections ≥21 d after the first dose (61% (95% confidence interval = 54–68%) versus 66% (95% CI = 60–71%), respectively) Greater reductions observed after a second dose (79% (95% CI = 65–88%) versus 80% (95% CI = 73–85%), respectively) ![](https://i.imgur.com/4AL4GGP.png) ::: :::spoiler `Five elements of Inferential Statistics` 1. 我們感興趣的實驗單位的母體 2. 一個,或是多個我們要調查的變數 3. 母體的樣本 4. 以樣本中所隱藏的資訊做出的,對母體的推估 5. 推估的信度 ::: ### 【Fundamental Elements of Statistics 統計的基本原素】 - Experimental unit 實驗單位 `Object upon which we collect data 收集數據的對象` - Variable 變數 `Characteristic of an individual experimental unit 這些單位所擁有的性質` - Population 母體 `All items of interest 所有感興趣的單位的集合` - Sample 樣本 `Subset of the units of a population 母體的子集合` - Representative sample `表現出目標群體所具有的典型特徵` - Simple random sample `每個不同樣本都有相同的選擇機會` - Measure of Reliability 信度 - Statistical Inference ### 【Types of Data 資料型態】 - Quantitative data 量化 - Discrete data 離散性 - Continuous data 連續性 - [Central Tendency 集中趨勢](#【Central-Tendency-集中趨勢】) - [Variability 變異數量](#【Variability-變異數量】) - [Distributional Forensics(Shape) 分配形狀](#【Distributional-Forensics(Shape)-分配形狀】) - Qualitative data 質性 - Ordinal data 序數型(<font color="red">排名</font>) - Nominal data 類別型(<font color="red">物種</font>) - Binomial data 二元型(<font color="red">是否為天主教:T</font>) ### 【Obtaining Data 資料蒐集】 1. Published source 2. Designed experiment - `Units` and `Units' Characteristic` **under control** - Typically involve `treatment`(實驗組) and `untreated`(對照組) group 3. Observationally study(incl. opinion polls & survey) - `Units` in **natural setting** - `Variables` are recorded - **No attempt** to control `units' characteristics` ## Chapter 2 Descriptive Statistics 敘述性統計 :::info :::spoiler Learning Objectives - [x] Describe data using **graphs** - [x] Describe data using **numerical measures** - [x] Describe **quantitative data** using numerical measures - [x] Describe the **relationship between two quantitative variables using graphs** - [x] Detecting descriptive **methods that distort the truth** ::: :::info :::spoiler Outlines ![](https://i.imgur.com/0qNhXqf.png) ![](https://i.imgur.com/F8nUagz.png) ::: ### 【Describing Qualitative Data 描述定性資料】 #### Key Terms - Class 類別 `全校大二學生裡的資管系學生` - Class Frequency 類別次數 `全校10000名大二學生,100名是資管系學生` - Class Relative Frequency 類別相對次數 `全校10000名大二學生,100名是資管系學生,100/10000=0.01` - Class Percentage 類別百分比 `全校10000名大二學生,100名是資管系學生,(100/10000)*100%=1%` #### 【Tables】 - Lists `categories` & `number of elements` - May show `frequencies(counts)`, `%` or both :::spoiler Picture ![](https://i.imgur.com/cTtuj1O.png =500x200) ::: #### 【Bar Chart 長條圖】 - Zero Point - Equal Bar Widths - 中間要有間格 :::spoiler Picture ![](https://i.imgur.com/wG4H61s.png =400x200) ::: #### 【Pie Chart 圓餅圖】 - Total Quantity -> Categories(顯示按類別劃分的總數量) - Angle size (360°)(percent) :::spoiler Picture ![](https://i.imgur.com/ne4wO5d.png =400x300) ::: #### 【Pareto Diagram 柏拉圖】 - 由大到小排的`Bar Chart` :::spoiler Picture ![](https://i.imgur.com/ZOQxbEN.png =400x200) ::: ### 【Graphical Methods for Describing Quantitative Data 圖像化描述定量資料】 #### 【Dot Plot 點圖】 - Horizontal axis is a scale for the quantitative variable, e.g., percent. :::spoiler Picture ![](https://i.imgur.com/1udgHYG.png) ::: #### 【Stem & Leaf Display 莖葉圖】 - <font color="red">上 $\rightarrow$ 下,小 $\rightarrow$ 大</font> - 十位數在**左側**,個位數在右側 - 相同值要寫出來以增加寬度 :::spoiler Picture ![](https://i.imgur.com/am6MXnG.png) > Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 ::: #### 【Histogram 直方圖】 - 定量變量的數值被劃分成區間 - 每個區間<font color="red">**等寬**</font> - `Bar's height` 是 `class frequency` or `relative frequency` or `precent` - Bar & Bar 不能有間隔 :::spoiler Picture ![](https://i.imgur.com/F1BUrYc.png) ![](https://i.imgur.com/odT96sn.png) ::: #### 【Summary】 ![](https://i.imgur.com/aTMr1Mm.png =600x300) ### 【Central Tendency 集中趨勢】 `def:the single value most` <font color="red">**typical/representative**</font> `of the collected data` - Central Tendency 集中趨勢 - Mean 平均值 - Median 中位數 - Mode 眾數 #### 【Mean 平均值】 ![](https://i.imgur.com/3wmohzM.png =600x200) - Advantage - Use **every value** in the data $\rightarrow$ **Good representative** - **Repeated drawn samples** from same population have **similar means** $\rightarrow$ **抵抗不同Sample間的波動** - Disadvantage - **Sensitive** to **extreme values/outliers** - Not appropriate for **skewed distribution(偏態分布)** - Cannot be calculated for **nominal** or **nonnominal ordinal data**(癌症期數) #### 【Median 中位數】 ![](https://i.imgur.com/pEseOFK.png =600x200) - **No affected** by **extreme values** #### 【Mode 眾數】 ![](https://i.imgur.com/jofg89J.png) - **Not affected** by **extreme values** - May be used for **quantitative** or **qualitative data** ### 【Variability 變異數量】 `def:the` <font color='red'>**spread, or dispersion**</font>`, of the values` - Variability 變異數量 - Range 全距 - Variance 變異數 - Standard Deviation 標準差 #### 【Range 全距】 ![](https://i.imgur.com/H8TJ0bP.png) - Disadvantage - **Ignores** data **distributed** - **Sensitive** to **extreme values/outliers** #### 【Variance 變異數】 ![](https://i.imgur.com/VihHx3J.png) - Most common measures - **Consider how data are distributed** - Show variation about mean #### 【Standard Deviation 標準差】 ![](https://i.imgur.com/ZVuH8OG.png) ### 【Distributional Forensics(Shape) 分配形狀】 - Shape - Skewness 偏態 - Left-Skewed - Symmetric - Right-Skewed - Kurtosis 峰態 #### 【Skewness 偏態】 `def:A data set is said to be` <font color='red'>**skewed**</font> `if one tail of the distribution has` **more extreme** `observations than the other tail.` - Left-Skewed `Mean < Median` - Symmetric `Mean = Median` - Right-Skewed `Mean > Median` ![](https://i.imgur.com/QPbCXwK.png) #### 【Kurtosis 峰態】 ![](https://i.imgur.com/cb2VYim.png) --- 待補 --- ## Chapter 3 Probability 機率 ## Chapter 4 Random Variables and Probability Distributions 隨機變數與機率分布