製造資料分析 === [TOC] Introduction --- 甚麼是異常檢測(Anomaly/Outlier detection) --- :::warning <style> .blue { color: blue; } .red { color: red; } .orange { color: orange; } .purple { color: purple; } </style> * 區分資料是<span class="blue">正常</span>還是<span class="red">異常</span>:tada::tada::tada: ::: * <span class="red">異常檢測</span>的性質 * <span class="red">異常檢測</span>與正常資料有差異 * <span class="red">異常檢測</span>→<span class="purple">罕見資料</span>(==1%==)![](https://i.imgur.com/HA4qtnz.png) * 那些情境會遇到<span class="red">異常資料</span> * ==設備機台==、==網路瀏覽==、==信用卡紀錄== 基於統計/傳統機器學習所提出的異常檢測方法 --- ### 理想狀況下的的異常檢測 ![](https://i.imgur.com/TY8a3yk.png) >[Source from: Pattern Recognition “Anomaly Detection Challenges” ](https://www.researchgate.net/publication/321682378_Pattern_Recognition_Anomaly_Detection_Challenges) ### 基於統計/機器學習的異常檢測 :::success * ==監督式==學習(supervised learning) * 邏輯式回歸(logistic regression) * 支持向量機(support vector machine) * 隨機森林(random forest) * XGBoost :::danger 由於異常類別資料比例過少,通常我們需要作些調整! :fire: :::spoiler * 常見類別不平衡處理方法 * Up sampling * Down sampling * Weighted loss ::: :::success * ==非監督式==學習(unsupervised learning) * 孤立森林(Isolation forest) [Isolation forest-Novelty and Outlier Detection(Scikit-Learn)](https://scikit-learn.org/stable/modules/outlier_detection.html) * Local Outlier Factors * One-class SVM ::: ### 延伸閱讀 :mega: :::info * [機器學習 異常檢測 ( Anomaly detection )](https://kennyliblog.nctu.me/2019/08/30/Machine-learning-Anomaly-detection/#%E9%AB%98%E6%96%AF%E5%88%86%E5%B8%83-Gaussian-distribution) * [異常檢測 — 從統計建模的角度切入吧](https://medium.com/@r41091113/%E7%95%B0%E5%B8%B8%E6%AA%A2%E6%B8%AC-%E5%BE%9E%E7%B5%B1%E8%A8%88%E5%BB%BA%E6%A8%A1%E7%9A%84%E8%A7%92%E5%BA%A6%E5%88%87%E5%85%A5%E5%90%A7-45e8ea9e2e25) * [Data Mining常見的(異常檢測)算法有哪些?](https://www.zhihu.com/question/280696035) ::: 使用AutoEncoder作異常檢測 --- GAN與異常檢測的結合:GANomaly --- [HackMD 使用教學](https://hackmd.io/c/tutorials-tw/%2Fs%2Fhow-to-embed-note-tw) ###### tags: `課堂筆記整理`