###### tags: `選修` # 資料探勘 老師:吳帆 教授、林育秀 助教授 :::spoiler 考試 && 作業 ![](https://i.imgur.com/yyA2d3J.png) ::: --- :::spoiler 第一堂 ## Knowledge Discovery in Databases(KDD) Process ![](https://i.imgur.com/Uk2BfU1.png) * Step: * 1.Data: 原始資料 * 2.Target Data: 取出要使用的Data * 3.Preprocessed Data: 前處理 * 4.Transfarmed Data: 轉換(e.g:1->男、2->女) * 5.Patterns/Madels: 解釋資料的Patterns(e.g: 打某疫苗死亡率高於提它疫苗,探討並解釋原因) * 6.Knowledge: 知識 --- 1. Clustering(分群) ![](https://i.imgur.com/OACyNV8.png) 2. Clustering: Market Segmentation ![](https://i.imgur.com/uloQjFD.png) 3. Association Rule Discovery ![](https://i.imgur.com/xc2b5O1.png) * Regerssion: a simple type of classification ![](https://i.imgur.com/xXsWpYj.png) * How It All Fist Together ![](https://i.imgur.com/5X0C1j0.png) * Other Data Mining Tasks ![](https://i.imgur.com/aMQPuWW.png) ### 1.Classification(分類) ![](https://i.imgur.com/T6Wn1fS.png) * An example: model construction(自動判定是否符合終身值) ![](https://i.imgur.com/uwxNhll.png) * An example: model usage ![](https://i.imgur.com/HTmXFsr.png) ### 2.Evaluation of Classification * Accuracy * Speed * Robustness * Scalability * Interpretability * Simplicity ### 3.Classify Classification ![](https://i.imgur.com/M8BCMPO.png) * Regression: 迴歸分析可以幫助人們了解在只有一個自變數變化時應變數的變化量。 * Classificaion ### Decision Tree example ![](https://i.imgur.com/odqgdhG.png) ![](https://i.imgur.com/2OSFhyJ.png) ![](https://i.imgur.com/SkQrNRm.png) ![](https://i.imgur.com/1YfkjWc.png) ![](https://i.imgur.com/9HnEaLz.png) ### What is the "splitting attribute" ![](https://i.imgur.com/aYKwyAR.png) ### How to determine the Best Split? ### Model Evaluation - Metrics for Evaluation ![](https://i.imgur.com/KCtZoui.png) * Precistion: * Recall: ### Class Imbalance Problem ### Model Evaluation - Metrics for Evaluation * Accuracy VS Weighted Accuracy ::: --- :::spoiler 2022/07/16 SAS ## SAS(Statistics Analysis System 統計分析系統) * SAS EG Interface : * ![](https://i.imgur.com/l91qaE8.png) * ![](https://i.imgur.com/W864jvQ.png) * What's ETL: ETL 是擷取 (extract)、轉換 (transform) 和載入 (load) 的英文縮寫,機構過去使用這個方式,將多個系統中的資料整合到單一資料庫、資料儲存庫、資料倉儲或資料湖泊中。 ## 資料分析流程 * ![](https://i.imgur.com/weOdF1Q.png) * ![](https://i.imgur.com/qlZCR5Y.png) * ![](https://i.imgur.com/PLsyH8c.png) :::