Azure Cognitive Search (認知搜尋)

###### tags: `Azure` # Azure Cognitive Search (認知搜尋) ## 建立 Cognitive Search 登入 Azure 入口網站，找到**認知搜尋**圖示![](https://hackmd.io/_uploads/rJAs9kRHh.png)並點選或是在建立資源裡搜尋 "cognitive search" 關鍵字。 ![](https://hackmd.io/_uploads/SysHLunSh.png) 進到建立流程畫面需要填入以下資訊: + 訂用帳戶 (Azure subscription 1) + 資源群組 (wingeneai) + 服務名稱 (irsystem) + 位置 (Japan East) + 定價層 (免費) ![](https://hackmd.io/_uploads/S1d0h6aBh.png) 以下是建立成功的畫面。 ![](https://hackmd.io/_uploads/S1f8T66r3.png) ## 建立 SQL Server 登入 Azure 入口網站，找到 **SQL Server** 圖示![](https://hackmd.io/_uploads/B1dc2kRr3.png)並點選或是在建立資源裡搜尋 "sql server" 關鍵字。 ![](https://hackmd.io/_uploads/BkTHJJRr2.png) 進到建立流程畫面需要填入以下資訊: + 訂用帳戶 (Azure subscription 1) + 資源群組 (wingeneai) + 伺服器名稱 (bryantdbserver) + 位置 (Japan East) ![](https://hackmd.io/_uploads/S154uk0Sn.png) 還有驗證的部分要進行設定，驗證方法有三種，選用 **"使用 SQL 驗證"**，並輸入 **"伺服器管理員登入"** 和 **"密碼"**，未來存取該伺服器需要的帳密。 ![](https://hackmd.io/_uploads/HJzm9yCSn.png) 接下來是網路的防火牆規則請選取 **"是"**。 ![](https://hackmd.io/_uploads/rkJjTk0Bh.png) 以下是建立成功的畫面。 ![](https://hackmd.io/_uploads/B1z8ylArn.png) ## 建立資料庫登入 Azure 入口網站，找到 **SQL 資料庫**圖示![](https://hackmd.io/_uploads/HJSIgeASn.png)並點選或是在建立資源裡搜尋 "sql database" 關鍵字。 ![](https://hackmd.io/_uploads/rkwSZgABn.png) 進到建立流程畫面需要填入以下資訊: + 訂用帳戶 (Azure subscription 1) + 資源群組 (wingeneai) + 資料庫名稱 (bryantdb) + 伺服器 (bryantdbserver) + 計算+儲存體 (基本) ![](https://hackmd.io/_uploads/SyTPMlAHh.png) 接下來網路的防火牆規則中，新增目前的用戶端 IP 位址選取 **"是"**。 ![](https://hackmd.io/_uploads/rJpSXlAHh.png) 以下是建立成功的畫面，此時的資料庫是空的，需要插入資料。 ![](https://hackmd.io/_uploads/BkULVlASh.png) ## 插入資料至資料庫 (使用 Python 和相關軟體) 一開始我們需要 [設定 pyodbc Python 開發的環境(Linux)](https://learn.microsoft.com/zh-tw/sql/connect/python/pyodbc/step-1-configure-development-environment-for-pyodbc-python-development#linux)先**開啟終端機**並執行以下代碼來**安裝 Microsoft ODBC driver for SQL Server**: ```bash if ! [[ "18.04 20.04 22.04" == *"$(lsb_release -rs)"* ]]; then echo "Ubuntu $(lsb_release -rs) is not currently supported."; exit; fi sudo su curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - curl https://packages.microsoft.com/config/ubuntu/$(lsb_release -rs)/prod.list > /etc/apt/sources.list.d/mssql-release.list exit sudo apt-get update sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18 # optional: for bcp and sqlcmd sudo ACCEPT_EULA=Y apt-get install -y mssql-tools18 echo 'export PATH="$PATH:/opt/mssql-tools18/bin"' >> ~/.bashrc source ~/.bashrc # optional: for unixODBC development headers sudo apt-get install -y unixodbc-dev ``` 接下來**安裝pyodbc**(這邊使用 Anaconda 環境)。 ```bash pip install pyodbc ``` ### 連線資料庫連線時須注意將用戶端的 IPv4 位址新增至 SQL Server 的防火牆規則，除此之外，還需要以下資訊才能進行連線： + 伺服器名稱 (bryantdbserver) + 資料庫名稱 (bryantdb) + 伺服器管理員 (bryant) + 密碼 (your password) + ODBC driver 的版本 (ODBC Driver 18 for SQL Server) ```python import pyodbc server = "[伺服器名稱].database.windows.net" database = "[資料庫名稱]" username = "[伺服器管理員]" password = "{[密碼]}" driver= "{ODBC Driver 18 for SQL Server}" conn = pyodbc.connect("DRIVER="+driver+";SERVER=tcp:"+server+";PORT=1433;DATABASE="+database+";UID="+username+";PWD="+ password) ``` ### 建立資料表定義資料表的欄位、型別和主鍵。 ```python cursor = conn.cursor() cursor.execute( """ CREATE TABLE products ( product_id int primary key, product_name nvarchar(50), price int ) """ ) conn.commit() ``` ### 資料插入以下是即將插入資料庫的資料(products.csv)。 | product_id | product_name | price | | -------- | -------- | -------- | | 1 | Desktop Computer | 800 | | 2 | Laptop | 1200 | | 3 | Tablet | 200 | | 4 | Monitor | 350 | | 5 | Printer | 150 | 透過 pandas 套件讀取檔案，並把每一行資料抓出來插入資料表。 ```python import pandas as pd df = pd.read_csv("products.csv") for index, row in df.iterrows(): cursor.execute("INSERT INTO products (product_id, product_name, price) values(?,?,?)", row.product_id, row.product_name, row.price) conn.commit() cursor.close() conn.close() ``` ## 建立資料來源在Azure 認知搜尋中，資料來源會與索引子搭配使用，提供目標索引的隨選或排程資料重新整理的連線資訊，從支援的 Azure 資料來源提取資料。我們需要以下資訊並使用 POST 方式執行： + 搜尋服務名稱 (irsystem) + api 的版本 (2020-06-30) + 搜尋服務的管理金鑰 (your api-key) + 資料來源名稱 (bryantproducts) + 資料來源類型 (azuresql) + 資料庫的連接字串 + 對於 Azure SQL Database，請選擇 ```ADO.NET (SQL 驗證)``` 選項。 :::warning 注意 :warning: 連接字串中需要輸入伺服器密碼 ::: + 資料表名稱 (products) ```HTTP curl --location 'https://[搜尋服務名稱].search.windows.net/datasources?api-version=[api 的版本]' \ --header 'Content-Type: application/json' \ --header 'api-key: [搜尋服務的管理金鑰]' \ --data '{ "name": "[資料來源名稱]", "type": "[資料來源類型]", "credentials": { "connectionString": "[資料來源的連接字串]" }, "container": { "name": "[資料表名稱]" } }' ``` ## 建立索引索引是組織及搜尋 Azure 認知搜尋中檔的主要方法，類似于資料表在資料庫中組織記錄的方式。我們需要以下資訊並使用 POST 方式執行： + 搜尋服務名稱 (irsystem) + api 的版本 (2020-06-30) + 搜尋服務的管理金鑰 (your api-key) + 索引名稱 (bryantproducts-index) ```HTTP curl --location 'https://[搜尋服務名稱].search.windows.net/indexes?api-version=[api 的版本]' \ --header 'Content-Type: application/json' \ --header 'api-key: [搜尋服務的管理金鑰]' \ --data '{ "name": "[索引名稱]", "fields": [ {"name": "product_id", "type": "Edm.String", "key": true}, {"name": "product_name", "type": "Edm.String"}, {"name": "price", "type": "Edm.Int32"} ], "corsOptions": { "allowedOrigins": ["*"] } }' ``` ## 建立索引子索引子會從支援的 Azure 資料來源自動編制索引。索引子會使用預先定義的 ***資料來源*** 和 ***索引*** 來建立索引管線，以擷取及序列化來源資料，並將其傳遞至搜尋服務以進行資料擷取。我們需要以下資訊並使用 POST 方式執行： + 搜尋服務名稱 (irsystem) + api 的版本 (2020-06-30) + 搜尋服務的管理金鑰 (your api-key) + 索引子名稱 (bryantproducts-indexer) + 資料來源名稱 (bryantproducts) + 索引名稱 (bryantproducts-index) ```HTTP curl --location 'https://[搜尋服務名稱].search.windows.net/indexers?api-version=[api 的版本]' \ --header 'Content-Type: application/json' \ --header 'api-key: [搜尋服務的管理金鑰]' \ --data '{ "name": "[索引子名稱]", "dataSourceName": "[資料來源名稱]", "targetIndexName": "[索引名稱]" } ' ``` ## 開始搜尋先安裝認知搜尋用戶端的 Python 套件。 ```bash pip install azure-search-documents ``` 連接認知搜尋服務需要以下資訊： + 搜尋服務名稱 (irsystem) + 搜尋服務的管理金鑰 (your api-key) + 索引名稱 (bryantproducts-index) ```python from azure.core.credentials import AzureKeyCredential from azure.search.documents import SearchClient service_name = "[搜尋服務名稱]" admin_key = "[搜尋服務的管理金鑰]" index_name = "[索引名稱]" # Create an SDK client endpoint = "https://{}.search.windows.net/".format(service_name) search_client = SearchClient( endpoint=endpoint, index_name=index_name, credential=AzureKeyCredential(admin_key), ) ``` 輸入查詢字串。 ```python results = search_client.search(search_text="*", include_total_count=True) print ('Total Documents Matching Query:', results.get_count()) for result in results: print("{}: {}, {}".format(result["product_id"], result["product_name"], result["price"])) ``` 輸出結果如下： ``` Total Documents Matching Query: 5 1: Desktop Computer, 800 2: Laptop, 1200 3: Tablet, 200 4: Monitor, 350 5: Printer, 150 ```