sklearn Pipeline SimpleImputer Preprocessing === 王辰禎, DCT, NTCU(Taiwan) --- ###### tags: `Pipeline` **資料科學與問題解決week05HW(3/21)** [Colab 程式碼](https://colab.research.google.com/drive/1uzBirqz8ZffiVMq9fE7QM8QvDGGcG5lZ?usp=sharing) --- #### **If you use python IDLE from python.org, you should use CMD and pip to install the sklearn package.** 1. open the cmd(命令提示字元) 2. 輸入 `pip install scikit-learn`  --- #### <正文>import(匯入)套件,e.g. random、math套件等等 ##### 由於Sklearn有六大部分,from Sklearn.[某部分] import [某部件]  --- #### 將題目的資料丟進Data Frame  --- #### 建立管道器(pipeline),定義名稱及對應動作。  SimpleImputer遺漏值處理 : strategy='median(中位數)、mean(平均值)、most_frequent(眾數,出現最多次的數)', MinMaxScaler() : 最小最大值標準化(將min->0, max->1,故數據會縮到0~1之間) *sklearn中常見資料預處理: StandardScaler,MinMaxScaler, MaxAbsScaler, and RobustScaler --- #### 將管道器pipeline應用到數值型欄位 ##### 先選擇數值型的行(column)並命名為<numeric_features> ##### 將<df數值型部分> 指定 pipeline 應用( .fit_transform() ) 到 <df數值型部分>  --- #### 印出DataFrame ##### 由於小數點太多看著不舒服,故四捨五入,但實際處理數據不應該隨意四捨五入,避免數據不精確。  --- #### 參考資料 : 1. [(scikit-learn.org)sklearn.impute.SimpleImputer](https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html) 2. [(scikit-learn.org)sklearn.preprocessing](https://scikit-learn.org/stable/api/sklearn.preprocessing.html) *[MinMaxScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler) 3. [(iT邦幫忙)[Day 5] 資料清理&前處理_10程式中(2020)](https://ithelp.ithome.com.tw/articles/10240494)
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up