# 統計與資料分析 Lecture4
###### tags: `20200711` `statistics`
吳漢銘
台北大學統計學系 副教授
> ## 畫圖很重要
## 大綱-探索式資料分析 統計圖表
## 主要參考書目
EDA with R: Course Content
* Making exploratory **graphs**
* Principles of analytic **graphics**
* Plotting systems and **graphics** devices in R
* The base, lattice, and ggplot2 **plotting** systems in R
* Clustering methods (群集分析)
* Dimension reduction techniques (維度縮減)
---
## John Tukey (1915~2000): 統計學界的畢卡索
> 問的問題要對
## 「統計應該是科學,而非數學!」
## What is EDA
## What Do They Say About EDA?
## Data Analysis Procedures
## 例子2: 川普推特誰寫的?
按讚數有不同
### 有疑問?
### 發文時間對比
iOS Anroid 時間很不一樣
### 發推文文字對比
### 情感分析
川普用字比較強烈
透過Poisson test
用Log2當基礎,把資料的方向分出來了,好方法
### 總結: 川普推特誰寫的?
## Infovis and Statistical Graphics: Different Goals, Different Looks
## Why Data Visualization?
xy有四組,看統計數字完全一樣
但結果差很多
> 不看圖,會誤判
## Anscombe's Quartet
## The Datasaurus Dozen ***install.packages("datasauRus")***
假設要分類分群,但用錯方法,最多就50%正確率
要看到資料才能走下一步
## The Datasaurus Dozen More examples
即使看到正向關係,但是有其他變數有相關。
## Graphical Perception
## Index Plot
## 直方圖 (Histogram)
## 圖表的誤用
## 範例: rgl, explore a comet
## Complex Heatmap
把數字變成顏色
## Applications: Array Image
訊號都是隨機的,但圖型都偏一邊。
單看欄位是無法判斷的
## Applications: Mobile Data
經緯換成行政區,然後用顏色代表
第二天早上跟第四天早上都沒資料
1.數據資料的圖形很重要,要看到資料的樣子,才能夠知道下一步要怎麼走
2.看圖先看尺度
3.heat map把數字變成顏色來表示,可以觀察出一些現象
4.資料整理與轉換很重要
## Applications: Eye-tracking, mouse clicking
### 讀取外部影像檔案
### 台灣地圖
### 於地圖上標記
## Big Data: The Era of 9 Vs
## The Challenge of Visualizing Big Data
## How Can We Visualize and Interact with Billion+ Record Databases in Real-time?
## ***hexbin*** Package: Hexagonal Binning Routines
## ***tabplot***: Tableplot, a Visualization of Large Datasets
#### tableplot(iris, nBins=150, sortCol=5)
#### tableplot(iris, nBins=50, sortCol=4)
#### tableplot(diamonds)
## Symbolic Data Analysis (Billard and Diday, JASA 2003)
## 推薦書目
> 1. **AGGREGATION** From Tables ans Means to Least Squares
> 2. **INFORMATION** Its Measurement and Rate of Change
> 3. **LIKELIHOOD** Calibration on a Probability Scale
> 4. **INTERCOMPARISON** Within-Sample Variation as a Standard
> 5. **REGESSION** Multivariate Analysis, Bayesian Inference, and Causal Inference
> 6. **DESIGN** Experimental Planning and the Role of Randomization
> 7. **RESIDUAL** Scientific Logic, Model Comparsion, and Diagnostic Display
## 資料無邊界!
## 未來方向?
## 進階選讀
### 例子1: The Doubs Fish Data
#### River Doubs Map
#### The Doubs Fish Data: 檔案
#### The Doubs Fish Data: 前置處理
#### Data Extraction: Read Data
#### Species Data: First Contact Basic functions
#### Overall Distribution of Abundances (Dominance Codes)
#### Species Data: A Closer Look Map of the Locations of the Sites
#### 註: 重建 Reconstruction
#### Maps of Some Fish Species
#### Compare Species: Number of Occurrences
#### Compare Sites: Species Richness
#### Compute Alpha Diversity Indices of the Fish Communities
#### Transformation and Standardization of the Species Data
#### Scale Abundances by Dividing Them by the Site Totals
#### Compute Relative Frequencies by Rows (Site Profiles)
#### Standardization by Both Columns and Rows
#### Boxplots of Transformed Abundances of a Common Species (Stone Loach)
#### Plot Profiles Along the Upstream-Downstream Gradient
#### Bubble Maps of Some Environmental Variables
#### Examine the Variation of Some Descriptors Along the Stream: Line Plots
#### Scatter Plots for All Pairs of Environmental Variables
#### Simple Transformation of An Environmental Variable
#### Standardization of All Environmental Variables
### 小結 & 想想看
### ***anscombe {datasets}*** Anscombe's Quartet of ‘Identical’ Simple Linear Regressions
### Extensions of Scatterplots
### MA plot Scatterplot for Gene Expression Data
### ***heatmap {stats}***
### 類別資料的視覺化: vcd Visualizing Categorical Data