[ML] Azure / AutoML === ###### tags: `ML / Platform` ###### tags: `ML`, `Azure` <br> [TOC] <br> ## 資源術語 > Azure Machine Learning 是最大的 ML 資源框架 ### [Azure Machine Learning](https://docs.microsoft.com/zh-tw/azure/machine-learning/overview-what-is-machine-learning-studio#ml-studio-classic-vs-azure-machine-learning-studio) - Azure Python SDK (用於本地電腦或線上 notebook) - Azure R SDK (用於本地電腦或線上 notebook) - Azure Machine Learning Studio (Web 新版) - Azure Machine Learning Studio (classic) (Web 舊版) - Azure CLI (用於命令列終端模式) 以上這些都可以操作 Azure 資源 <br> ## [教學文章 (Azure Machine Learning 文件)](https://docs.microsoft.com/zh-tw/azure/machine-learning/) > [![](https://i.imgur.com/TjIGNF0.png)](https://i.imgur.com/TjIGNF0.png) > > [![](https://i.imgur.com/9Cx11VF.png)](https://i.imgur.com/9Cx11VF.png) > > - 開始使用:程式碼優先 > - 開始使用:低程式碼或無程式碼 - ### [使用 Azure Machine Learning 建立、檢閱和部署自動化機器學習模型](https://docs.microsoft.com/zh-tw/azure/machine-learning/how-to-use-automated-ml-for-ml-models) - 在 Azure Machine Learning studio 中不需要一行程式碼,即可建立、探索及部署自動化機器學習模型 - ### [什麼是自動化機器學習 (AutoML)?](https://docs.microsoft.com/zh-tw/azure/machine-learning/concept-automated-ml) - 使用 AutoML 的時機:分類、迴歸及預測 - 分類(Classification) - 迴歸(Regression) - 時間序列預測(Time-series forecasting) - 針對有限或無程式碼體驗,請嘗試 Azure Machine Learning Studio Web 體驗,網址為:https://ml.azure.com - ### [教學課程:使用自動化機器學習來預測需求](https://docs.microsoft.com/zh-tw/azure/machine-learning/tutorial-automated-ml-forecast) <br> ## 入口網站 - https://ml.azure.com/ - https://portal.azure.com/ ![](https://i.imgur.com/csAeW1A.png) <br> <hr> <br> ## https://portal.azure.com/ > 進入 https://ml.azure.com/ 之前 > 需要先建立「資源群組」和資源群組底下的「工作區」 [![](https://i.imgur.com/fzLxbNF.png)](https://i.imgur.com/fzLxbNF.png) <br> [![](https://i.imgur.com/UTigBwD.png)](https://i.imgur.com/UTigBwD.png) <br> [![](https://i.imgur.com/7MjhYEO.png)](https://i.imgur.com/7MjhYEO.png) <br> [![](https://i.imgur.com/VcjEiT9.png)](https://i.imgur.com/VcjEiT9.png) <br> [![](https://i.imgur.com/BwRArnq.png)](https://i.imgur.com/BwRArnq.png) <br> [![](https://i.imgur.com/5HA3F3m.png)](https://i.imgur.com/5HA3F3m.png) <br> [![](https://i.imgur.com/SjXGK9g.png)](https://i.imgur.com/SjXGK9g.png) <br> [![](https://i.imgur.com/eQPDyLA.png)](https://i.imgur.com/eQPDyLA.png) <br> 前往工作區 [![](https://i.imgur.com/N60EAGO.png)](https://i.imgur.com/N60EAGO.png) <br> 切換工作區 [![](https://i.imgur.com/WkCZ0rg.png)](https://i.imgur.com/WkCZ0rg.png) <br> <br> ## https://ml.azure.com/ [![](https://i.imgur.com/c0smuqw.png)](https://i.imgur.com/c0smuqw.png) <br> [![](https://i.imgur.com/RJRd8sB.png)](https://i.imgur.com/RJRd8sB.png) <br> [![](https://i.imgur.com/OUA1MPh.png)](https://i.imgur.com/OUA1MPh.png) <br> [![](https://i.imgur.com/uT3CfAI.png)](https://i.imgur.com/uT3CfAI.png) <br> [![](https://i.imgur.com/zdBPYGm.png)](https://i.imgur.com/zdBPYGm.png) <br> [![](https://i.imgur.com/40aFAW1.png)](https://i.imgur.com/40aFAW1.png) <br> 建立資料集: - 名稱允許空白 ![](https://i.imgur.com/ez6jy9D.png) - 名稱允許中文 ![](https://i.imgur.com/p2KnEXI.png) <br> 選取本機端檔案,並上傳 ![](https://i.imgur.com/4xIjeNy.png) <br> 設定檔案結構,並預覽 [![](https://i.imgur.com/oslbgwa.png)](https://i.imgur.com/oslbgwa.png) <br> 預覽欄位類型 [![](https://i.imgur.com/BldF7eF.png)](https://i.imgur.com/BldF7eF.png) <br> 修正欄位類型 [![](https://i.imgur.com/vl3zdFe.png)](https://i.imgur.com/vl3zdFe.png) - 比如:原資料為 0, 1, 1, 0,系統辨識為整數 當資料類型切到 bool 時 就會立刻執行剖析,轉成 false,,true, true, false 若是轉換失敗,則如上圖 <br> [![](https://i.imgur.com/MID09b6.png)](https://i.imgur.com/MID09b6.png) <br> 資料預覽 [![](https://i.imgur.com/Bj4EUWf.png)](https://i.imgur.com/Bj4EUWf.png) <br> 檢視資料統計與分布狀況 [![](https://i.imgur.com/WPf4yU8.png)](https://i.imgur.com/WPf4yU8.png) <br> [![](https://i.imgur.com/ZdIlomf.png)](https://i.imgur.com/ZdIlomf.png) <br> [![](https://i.imgur.com/CUFBptY.png)](https://i.imgur.com/CUFBptY.png) <br> [![](https://i.imgur.com/ArLDO3b.png)](https://i.imgur.com/ArLDO3b.png) <br> 配置計算叢集資源 [![](https://i.imgur.com/IAZGPB5.png)](https://i.imgur.com/IAZGPB5.png) <br> 建立新的 auto-ML ![](https://i.imgur.com/OMzw4MG.png) - 實驗名稱無法使用中文,資料集名稱卻可以 ![](https://i.imgur.com/oYzm43J.png) - 實驗名稱之間也不能有空白,資料集名稱卻可以 ![](https://i.imgur.com/agaBz3Z.png) <br> 選擇任務類型:分類 / 迴歸 / 時間序列預測 [![](https://i.imgur.com/sEL4r1u.png)](https://i.imgur.com/sEL4r1u.png) - 時間序列預測 - 時間欄位不能有資料重複 - auto-ml 其他組態 [![](https://i.imgur.com/lb60hNV.png)](https://i.imgur.com/lb60hNV.png) - auto-ml 資料特徵化 (遺失資料插補) [![](https://i.imgur.com/22h2COp.png)](https://i.imgur.com/22h2COp.png) <br> auto-ML 正在初始化 / 執行中 ![](https://i.imgur.com/5TjvnaC.png) <br> auto-ML 實驗結果 ![](https://i.imgur.com/yeRSW1E.png) ![](https://i.imgur.com/FSn0BzW.png) ![](https://i.imgur.com/FibgugH.png) ![](https://i.imgur.com/Bsqew7B.png) ![](https://i.imgur.com/q7FS3OC.png) ![](https://i.imgur.com/VE9ADJq.png) ![](https://i.imgur.com/0W7Go3g.png) | 欄位 | 特徵重要度 | 累加重要度 | | -------- | ----------- | --------- | | appointment ID | 0.396 | 0.396 | | SMS_received | 0.226 | 0.622 | | Age | 0.208 | 0.830 | | Neighbourhood | 0.099 | 0.929 | | Scholarship | 0.020 | 0.949 | | ... | ... | ... | | AppointmentDay | 0 | 1.000 | | ScheduledDay | 0 | 1.000 | ![](https://i.imgur.com/GWad2rC.png) SparseNormalizer, RandomForest 的模型解釋 [![](https://i.imgur.com/npWPABr.png)](https://i.imgur.com/npWPABr.png) <br> ### 計算管理 硬碟滿了 [![](https://i.imgur.com/wj4zQpU.png)](https://i.imgur.com/wj4zQpU.png) <br> ### 佈署 [![](https://i.imgur.com/SvBsaEt.png)](https://i.imgur.com/SvBsaEt.png) - ### 端點測試程式碼範本 ```python= import pandas df = pandas.read_csv('../dataset/regression/house_prices/input/test_x.csv') df ``` ```python= import numpy def get_data(col_index_start, col_index_end = None): if col_index_end is None: col_index_end = col_index_start + 1 record_list = [] for r in range(col_index_start, col_index_end): record = {} for c in range(len(df.columns)): value = df.values[r, c] if type(value) is float and numpy.isnan(value): # 例外處理,訓練集沒訓練到 if df.columns[c] in ['MasVnrArea']: # 是插補欄位 (遺漏特徵值插補) value = 'NaN' else: # 不是插補欄位 value = 0 elif type(value) is float or type(value) is int: pass else: value = '%s' % str(value) record[df.columns[c]] = value record_list.append(record) data = {"data": record_list} return data ``` ``` # 將 AzureML 提供的端點存取程式碼範本,包成 predict 並改寫下面的code #print(result) return eval(eval(result))['result'] ``` ```python= start_index = 0 end_index = len(df) for r in range(start_index, end_index): data = get_data(r) pred_y = predict(data) #print(r, ':', df.values[r,0], pred_y[0]) print("%s,%s" % (str(df.values[r,0]), str(pred_y[0]))) ``` <br> ### 資料集 - 從資料存放區建立資料集:時間剖析 [![](https://i.imgur.com/fjBt5rA.png)](https://i.imgur.com/fjBt5rA.png) - [分割時間戳記](https://docs.microsoft.com/zh-tw/azure/machine-learning/how-to-monitor-datasets?tabs=python#create-target-dataset) ![](https://i.imgur.com/ZfZ0ISd.png) <br> ### [設計工具](https://docs.microsoft.com/zh-tw/azure/machine-learning/algorithm-module-reference/web-service-input-output) ![](https://i.imgur.com/kThEUMi.png) ![](https://i.imgur.com/DkSKejL.png =45%x) ![](https://i.imgur.com/PSHSRDZ.png =45%x) ![](https://i.imgur.com/VILVUUr.png) ![](https://i.imgur.com/38ExoRS.png) ![](https://i.imgur.com/IibT1XX.png) ![](https://i.imgur.com/yHH7Czc.png) ![](https://i.imgur.com/MHnctdg.png) ![](https://i.imgur.com/3WSFcbJ.png) ![](https://i.imgur.com/mkSVxeh.png) ![](https://i.imgur.com/vXcZ79Y.png) ![](https://i.imgur.com/VhfrHDa.png) ![](https://i.imgur.com/8wEdKV5.png) <br> ### [Machine Learning 演算法功能提要](https://docs.microsoft.com/zh-tw/azure/machine-learning/algorithm-cheat-sheet) ![](https://i.imgur.com/DPN3Ve6.png) <br> <hr> <br> ## Bug ### Bug3 - ### error ![](https://i.imgur.com/VVSfbCV.png) - ### 訓練階段的資料插補 ![](https://i.imgur.com/xXDAJks.png) - ### 測試集的資料 ![](https://i.imgur.com/ah26eJH.png) ```json 'MasVnrArea': 'NA' <--- 會掛掉 ``` - ### 結論 對於有插捕的欄位 在訓練階段是有處理的 但在推論階段會丟 exception 😐 沒有跑同一套 pipeline 有資料缺失的欄位: | idx | Id | SalePrice | | ---- | ---- | -------- | | 660 | 2121 | 92072.74947261384 | | 1116 | 2577 | 158862.94116490975 | - ### 解決方案 json 資料改成: ```json 'MasVnrArea': 'NaN' ``` ``` df['MasVnrArea'].dtype == float and numpy.isnan(df['MasVnrArea'][231]) # 設為 'NaN' ``` ### Bug4 - 無法產生模型的可解釋性分析 ![](https://i.imgur.com/biZPHqv.png) <br> <hr> <br> ## Python ### Q & A - [ModuleNotFoundError: No module named 'azureml'](https://docs.microsoft.com/en-us/answers/questions/211503/modulenotfounderror-no-module-named-39azureml39.html) ``` pip install azureml-sdk ``` - [ModuleNotFoundError: No module named 'automl'](https://stackoverflow.com/questions/53588040/no-module-named-automl-when-unpickle-auto-trained-model) ``` pip install --upgrade azureml-sdk[notebooks,automl] ``` - [AttributeError: module 'scipy.misc' has no attribute 'logsumexp'](https://stackoverflow.com/questions/56324165/python-error-attributeerror-module-scipy-misc-has-no-attribute-logsumexp)