# 統計與資料分析 Lecture3
###### tags: `20200711` `statistics`
吳漢銘
台北大學統計學系 副教授
## 大綱-資料處理方法 平滑技巧/遺失值處理/資料轉換/重抽法則
主題1 [進階選讀]
* 移動平均(MovingAverage)
* 曲線配適(FittingCurves):lowess
* 核密度估計(KernelDensityEstimation)
* 三次樣條插值(CubicSplineInterpolation)
主題2
* 具缺失值資料(MissingData)
* 缺失機制(MissingnessMechanism)
* MissingbyDesign
* MissingCompletelyatRandom(MCAR)
* MissingatRandom(MAR)
* MissingNotatRandom(MNAR)
主題3
* R Packages for Dealing With Missing
Values: VIM, MICE
* Visualizing the Pattern of Missing Data
* Traditional Approaches to Handling Missing Data
* Imputation Methods: KNN
* Which Imputation Method?
主題4
* 為什麼要做資料轉換?
* 常見的資料轉換方式
* 對數轉換(LogTransformation)
* Box-Cox Transformation
* 標準化(Standardization)
* 要使用哪一種資料轉換方式?
主題5
* Training data and Testing data
* Resampling methods
* Jackknife(leave-one-out)
* Bootstrapping
* Ensemble Learning
* bagging
* boosting
主題6
* Imbalanced Data Problem
* under-sampling
* over-sampling
---
## 具缺失值資料 (Missing Data)
## 缺失值的處理
## 缺失機制Missingness Mechanism
## Missing by Design Missing Completely at Random
## Missing at Random (MAR) Missing Not at Random (MNAR)
## Some Notes
## Missing Values in R
## NA in Summary Functions
## NA in Modeling Functions
## Other Special Values in R
## R Packages for Dealing With Missing Values
## R Package: MICE
## Generates Multivariate Imputations by Chained Equations (MICE)
## Exploring Missing Data
## Visualizing the Pattern of Missing Data
## Matrix Plot
## Number of Observations Per Patterns for All Pairs of Variables
## Marginplot
## List-wise Deletion
## Pairwise Deletion
## Mean Substitution
## K-Nearest Neighbour Imputation
## ***kNN {VIM}:*** k-Nearest Neighbour Imputation
## ***matrixplot*** 自定平均函數
## Which Imputation Method?
## Classical (Numerical) Data Table
## 為什麼要做資料轉換
## 常見的資料轉換方式
## 範例: Software Inspection Data
## 對數轉換 (Log Transformation)
## 對數轉換:How to Handle Negative Data Values?
## Box-Cox Transformations
## Modified Box-Cox Transformations
## 標準化 (Standardization)
## 範例: Standardization
## 範例: Microarray Data of Yeast Cell Cycle
## Standardization in Time Series Microarray Gene Expression Experiments
## 範例: Crab Data
## 範例: cDNA Microarray Gene Expression Data
## 要使用哪一種資料轉換方式?
## Classification:k-fold Cross-Validation Error Rates
## Split Data into Test and Train Set According to Group Labels
## Jackknife Resampling: Leave-one-out
## 自助法、拔靴法 Bootstrap Methods
## Bootstrapping
## ***bootstrap*** Package
## Bagging: Bootstrap Aggregating
## Boosting
## Example: Apply ***rpart*** to Vehicle Data
## ***adabag***: An R Package for Classification with Boosting and Bagging
## Example: 10-fold CV adaboost.M1
## 不平衡資料問題 The Imbalanced Data Problem
## ***unbalanced*** Racing for Unbalanced Methods
## The Balancing Technique
## Ionosphere (電離層) dataset ***ubIonosphere {unbalanced}***
## Compare the Performances using SVM
## ***ubRacing {unbalanced}*** Racing for Strategy Selection
## Racing for Strategy Selection
## Useful R Packages
## 進階選讀
### Simple Moving Average
### Moving Average Acting as Resistance - Potential Sell Signal
### Smoothing in R
### 曲線配適 (Fitting Curves)
### ***lowess {stats}*** locally-weighted polynomial regression
### Density Plots (Smoothed Histograms)
### Kernel Density Estimation
### Spline approximate to the top profile of the ruddy duck
### ***smooth.spline {stats}***: Fit a Smoothing Spline
### Cubic Spline Interpolation