雲端平台 - GCP

# 雲端平台 - GCP *6/10 * 雲：透過雲端運算，使用者不用自己架設機台。serverless，撇除了裝置的問題。公司不用付IT費用，不同部門可同時取用相同資源。 * Iaas基礎結構即服務: 租用土地 Paas平台即服務: 租用建造房屋需要的工具跟設備，Microsoft Axure Saas軟體即服務: 租房子，Slack * 雲端部屬類型私人雲- 只給某些人使用公用雲- 外部廠商 * 設定Anaconda/Jupyter環境 ![](https://hackmd.io/_uploads/HJgs6wWDh.png) * Jupyter a是在上面增加一個區塊 b是在下面增加一個區塊 !表示在terminal執行，如!pip list、!python --version 執行區塊: shift+enter或是按run ctrl shift - 分割上下 * Big Query 主機- Taiwan: asia-east1 或東京 ### Big Query實做 * GCP Projects (專案) > BigQuery Datasets (資料集) > BigQuery Tables (資料表) > BigQuery View (檢視表) [文件說明](https://cloud.google.com/bigquery/docs/locations) * 啟動金鑰 ![](https://hackmd.io/_uploads/rkbXdkGwh.png) ![](https://hackmd.io/_uploads/r1EN_JzPn.png) ![](https://hackmd.io/_uploads/Sy8rd1GPh.png) ![](https://hackmd.io/_uploads/r1DIuJfv2.png) ![](https://hackmd.io/_uploads/H1kdu1zD3.png) ![](https://hackmd.io/_uploads/BkBKukMPn.png) * 建立資料集(資料集名稱create_new_dataset) ```py= from google.cloud import bigquery as bq import datetime import pandas as pd import pyarrow import os os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "C:\\Users\\Samantha\\Documents\\dv105\\0610雲端平台GCP\\neat-throne-389400-6704b48a595e.json" client = bq.Client() #建資料集 dataset_id = 'neat-throne-389400.create_new_dataset'#設定Dataset 名稱,以修改 dataset = bq.Dataset(dataset_id) dataset.location = "asia-east1" #設定資料位置,如不設定預設是S dataset.default_table_expiration_ms =30*24*60*60*1000#設定資料期時間,這邊設定30天過期 dataset.description = 'neat-throne-389400 & expiration in 30 days & location at asia-east1'# 設定dataset描述 dataset = client.create_dataset(dataset) # Make an API request. datasets = list(client.list_datasets()) # Make an API request.] for dataset in datasets: print(dataset.dataset_id) ``` * 建立資料表(資料表名稱create_table) ```py= #設定Table名稱 os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "C:\\Users\\Samantha\\Documents\\dv105\\0610雲端平台GCP\\neat-throne-389400-6704b48a595e.json" client = bq.Client() table_id = "neat-throne-389400.create_new_dataset.create_table" #設定Table資料結構 schema = [ bq.SchemaField("name", "STRING"), bq.SchemaField("post", "STRING"), bq.SchemaField("timestamp", "TIMESTAMP"), ] table = bq.Table(table_id, schema=schema) table.expires = datetime.datetime.now() + datetime.timedelta(days=6)#設定Table過時間 table.description = "create a new table and write the description."#設定Table描述 #設定Table Partition # table.time_partitioning = bq.TimePartitioning( # type_=bq.TimePartitioningType.DAY, # field="timestamp", # expiration_ms=7776000000, # ) #建立 Table table = client.create_table(table) # Make an API request. ``` * 將資料寫入資料表create_table ```py= #将json 資料傳人BigQuery df = pd.DataFrame({ 'name': ['Max'], 'post': ['1'], 'timestamp': [datetime.datetime.now()] }) table = client.dataset('create_new_dataset').table('create_table') job = client.load_table_from_dataframe(df,table) job.result() ``` * 建立資料表(資料表名稱create_nested_table) ```py= table_id = "neat-throne-389400.create_new_dataset.create_nested_table" #設定Table資料結構 schema = [ bq.SchemaField('post', 'STRING', mode='NULLABLE'), bq .SchemaField('account', 'RECORD', mode='REPEATED', fields=[ bq.SchemaField('name', "STRING", mode="NULLABLE"), bq.SchemaField('address', "STRING" , mode="NULLABLE"), bq.SchemaField('number', "INTEGER" , mode="NULLABLE") ]) ] table = bq.Table(table_id, schema=schema) #設定Table期時間 table.expires = datetime.datetime.now() + datetime.timedelta(days=6) #設定Table描述 table.description = 'create a new table and write the description.' # 建立 Table table = client.create_table(table) # Make an API request. ``` * 將資料寫入資料表create_nested_table ```py= import json #将json 資料傳人BigQuery now_stamp = datetime.datetime.now() print(now_stamp) json_data = [{ 'post' : 'post01', 'account':[{ 'name':'Max', 'address':'忠孝東路走九遍', 'number':'0900000000' }] }] table = client.dataset('create_new_dataset').table('create_nested_table') job = client.load_table_from_json(json_data,table) job.result() ``` ### Big Query 爬蟲實做 * 建立爬蟲的資料表(ifoodie_table) ```py= table_id = "neat-throne-389400.create_new_dataset.ifoodie_table" schema = [ bq.SchemaField('restaurant_url', 'STRING', mode='NULLABLE'), bq.SchemaField('name', 'STRING', mode='NULLABLE'), bq.SchemaField('address', 'STRING', mode='NULLABLE'), bq.SchemaField('category','RECORD',mode='REPEATED', fields=[ bq.SchemaField('tag', "STRING", mode="NULLABLE"), bq.SchemaField('tag_url', "STRING" , mode="NULLABLE") ]) ] table = bq.Table(table_id, schema=schema) #設定Table期時間 table.expires = datetime.datetime.now() + datetime.timedelta(days=6) #設定Table描述 table.description = 'create a new table and write the description.' # 建立 Table table = client.create_table(table) # Make an API request. ``` * 爬ifoodie網頁，並將爬到的資料寫入資料表ifoodie_table ```py= from bs4 import BeautifulSoup import requests import json #将json 資料傳人BigQuery now_stamp = datetime.datetime.now() print(now_stamp) url =f"https://ifoodie.tw/explore/list/%E9%8D%8B%E9%A1%9E" response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') paragraph = soup.find_all('div', {"class": 'restaurant-info'}) data = [] for para in paragraph: try: url= para.a['href'] restaurant_url = f'https://ifoodie.tw/{url}' name = para.findChild("div").findChild("div").findChild("div").findChild("a").text address = para.findChild("div").findChild("div").findNextSibling("div").findNextSibling("div").findNextSibling("div").text tags = para.find_all('a', {'class': 'category'}) tags_list = [] for t in tags: tag = t.text turl = t['href'] tag_url = f'https://ifoodie.tw/{turl}' tag_obj = {'tag': tag, 'tag_url': tag_url} tags_list.append(tag_obj) para_obj = {'restaurant_url': restaurant_url, 'name': name, 'address': address, 'category': tags_list} data.append(para_obj) except Exception as e: print("錯誤發生:", e) continue table = client.dataset('create_new_dataset').table('ifoodie_table') job = client.load_table_from_json(data,table) job.result() ``` * GCP上的畫面 ![](https://hackmd.io/_uploads/r1xgfeMwn.png) * 啟用Maps Javascript API服務 ![](https://hackmd.io/_uploads/Hykgp3fPh.png) * 到憑證確認金鑰 ![](https://hackmd.io/_uploads/B1dMp3zv2.png) ![](https://hackmd.io/_uploads/HyEEphGw3.png) * 金鑰用於Html呼叫google map api時使用 ![](https://hackmd.io/_uploads/HyvcahGwn.png) ### 用Geocoding API將地址轉換為經緯度 * 先建立BQ資料表 ifoodie_address_info ```py= from google.cloud import bigquery as bq import datetime import os os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "C:\\Users\\Samantha\\Documents\\dv105\\0610雲端平台GCP\\neat-throne-389400-6704b48a595e.json" client = bq.Client() table_id = "neat-throne-389400.create_new_dataset.ifoodie_address_info" schema = [ bq.SchemaField('name', 'STRING', mode='NULLABLE'), bq.SchemaField('address', 'STRING', mode='NULLABLE'), bq.SchemaField('lat', 'FLOAT', mode='NULLABLE'), bq.SchemaField('lng', 'FLOAT', mode='NULLABLE'), ] table = bq.Table(table_id, schema=schema) #設定Table期時間 table.expires = datetime.datetime.now() + datetime.timedelta(days=6) #設定Table描述 table.description = 'create a new table and write the description.' # 建立 Table table = client.create_table(table) # Make an API request. ``` * 使用API轉換經緯度，並寫入BQ資料表 ```py= import json import requests from google.cloud import bigquery as bq import datetime import os os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "C:\\Users\\Samantha\\Documents\\dv105\\0610雲端平台GCP\\neat-throne-389400-6704b48a595e.json" client = bq.Client() with open('C:\\Users\\Samantha\\Documents\\dv105\\0525專題二-數據一條龍\\ifoodie.txt', 'r', encoding='utf-8') as f: data = json.load(f) data_obj = [] for i in data: name = i['name'] address = i['address'] response = requests.get(f'https://maps.googleapis.com/maps/api/geocode/json?address={address}&key=AIzaSyCg-FTOgv2CzJ0hbNkaiB2xr8z8MBVVQww') a = json.loads(response.text) lati = a['results'][0]['geometry']['location']['lat'] lont = a['results'][0]['geometry']['location']['lat'] json_data = {'name': name, 'address': address, 'lat': lati, 'lng': lont} data_obj.append(json_data) table = client.dataset('create_new_dataset').table('ifoodie_address_info') job = client.load_table_from_json(data_obj,table) job.result() ``` * 完成後GCP BQ畫面 ![](https://hackmd.io/_uploads/S1PE6JXDn.png) * 利用經緯度畫圖 1. 在BQ的頁面上，點匯出-透過Looker Studio探索 ![](https://hackmd.io/_uploads/BkiZA0Xv3.png) 2. 將圖表改為泡泡地圖，並新增將經緯度合併的欄位。 ![](https://hackmd.io/_uploads/Sy2FJJ4Dn.png) ![](https://hackmd.io/_uploads/SJZkbkVD2.png) ![](https://hackmd.io/_uploads/SyyAlyEv2.png) 3. 將location拉到地區的維度處即可完成。 ![](https://hackmd.io/_uploads/HkOt-1Nw2.png) ### Vision AI * 啟用[Cloud Vision AI](https://cloud.google.com/vision?hl=zh-tw) ![](https://hackmd.io/_uploads/BkOGvyVw3.png) * 安裝Vision套件 ```$ !pip install google-cloud-vision``` * 取得圖片的label資訊 ```py= from google.cloud import vision def label_image(image_path): client = vision.ImageAnnotatorClient() with open(image_path, 'rb') as image_file: content = image_file.read() image = vision.Image(content=content) response = client.label_detection(image=image) labels = response. label_annotations print(labels) for label in labels: print(label.description) image_path = 'C:\\Users\\Samantha\\Desktop\\PIC\\19.jpg' label_image(image_path) ``` ![](https://hackmd.io/_uploads/BJ9NleEP3.png) * 以上方label資訊改寫，取得圖片的image_property資訊(色彩資訊) ```py= from google.cloud import vision def analyze_image_properties(image_path): client = vision.ImageAnnotatorClient() with open(image_path, 'rb') as image_file: content = image_file.read() image = vision.Image(content=content) features = [vision.Feature(type_=vision.Feature.Type.IMAGE_PROPERTIES),] request = vision.AnnotateImageRequest(image=image, features=features) response = client.batch_annotate_images(requests=[request]) image_properties = response.responses[0].image_properties_annotation dominant_colors = image_properties.dominant_colors.colors for color in dominant_colors: print("Color RGB: {}, Score: {}".format(color.color.red, color.color.green, color.color.blue, color.score)) image_path = 'C:\\Users\\Samantha\\Desktop\\PIC\\123.jpg' analyze_image_properties(image_path) ``` ![](https://hackmd.io/_uploads/rJaQXg4D3.png) * 文本辨識 ```py= #文本辨識 from google.cloud import vision def detect_text(image_path): client = vision.ImageAnnotatorClient() with open(image_path, 'rb') as image_file: content = image_file.read() image = vision.Image(content=content) response = client.text_detection(image=image) texts = response.text_annotations for text in texts: print(text.description) image_path = 'D:\\APPS\\Desktop\\GCP\\Picture\\123.jpg' detect_text(image_path) ``` * 取得照片所有資訊 ```py= from google.cloud import vision def detect_labels_landmarks_faces(image_path): client = vision.ImageAnnotatorClient() with open(image_path, 'rb') as image_file: content = image_file.read() image = vision.Image(content=content) #設定檢測的特徵 features = [ vision.Feature(type_=vision.Feature.Type.LABEL_DETECTION) , vision.Feature(type_=vision.Feature.Type.LANDMARK_DETECTION) , vision.Feature(type_=vision.Feature.Type.FACE_DETECTION) , vision.Feature(type_=vision.Feature.Type.IMAGE_PROPERTIES) , vision.Feature(type_=vision.Feature.Type.SAFE_SEARCH_DETECTION) ] image_context = vision.ImageContext( language_hints=[ "en" ] #如果需要，您可以指定語言提示 ) #建立詮釋圖像請求 request = vision.AnnotateImageRequest( image=image, features=features, image_context=image_context ) response = client.annotate_image(request) print(response) print('========') #提取標籤 if response.label_annotations: print("標籤:") for label in response.label_annotations: print(label.description) #提取地標 if response.landmark_annotations: print("地標") for landmark in response.landmark_annotations: print('地標名稱:', landmark.description) print('位置矩陣範圍:') print('左:', landmark.bounding_poly.vertices[0].x) print('上:', landmark.bounding_poly.vertices[0].y) print('右:', landmark.bounding_poly.vertices[2].x) print('下:', landmark.bounding_poly.vertices[2].y) print('----------') #提取人臉物件的屬性 if response.face_annotations: print('人臉物件屬性') for face in response.face_annotations: print('人臉位置矩陣範圍:') print('左:', face.bounding_poly.vertices[0].x) print('上:', face.bounding_poly.vertices[0].y) print('右:', face.bounding_poly.vertices[2].x) print('下:', face.bounding_poly.vertices[2].y) print('其他屬性:', face) #提取影像屬性 if response.image_properties_annotations: props = response.image_properties_annotations print('影像顏色屬性:') for color in props.dominant_colors.colors: print('顏色:', color.color) print('分數:', color.score) print('像素比例:', color.pixel_fraction) #提取安全搜索屬性 if response.safe_search.annotation: safe_search = response.safe_search.annotation print('安全搜索屬性:') print('成人:', safe_search.adult) print('spoof:', safe_search.spoof) print('medical:') print('暴力:', safe_search.violence) print('racy:', safe_search.racy) return response image_path = 'D:\\APPS\\Desktop\\GCP\\Picture\\123.jpg' detect_labels_landmarks_faces(image_path) ``` ### DialogFlow * 在DialogFlow上建立新的Agent，選擇Default Welcome Intent，在Training phrases中打入讓機器接受的詞彙，如「你好」。 ![](https://hackmd.io/_uploads/B1Dz90mPn.png) ![](https://hackmd.io/_uploads/BJH69A7w3.png) * 到GCP的IAM-服務帳戶-Create Service Account，服務帳戶名稱需與上方新Agent名稱相同，角色需選擇DialogFlow - DialogFlow API用戶端。創建後新增金鑰，並將金鑰存入電腦。 ![](https://hackmd.io/_uploads/HkHUi07D3.png) ![](https://hackmd.io/_uploads/rkwCiC7v2.png) ![](https://hackmd.io/_uploads/HkNdnRQDn.png) * 安裝DialogFlow套件 ```$ !pip install google-cloud-dialogflow``` * 使用以下Code可對DialogFlow做API存取注意要調整Key的路徑(os.environ)與project名稱 ```py= import os from google.cloud import dialogflow_v2 from google.cloud.dialogflow_v2 import types os.environ["GOOGLE APPLICATION CREDENTIALS"]= "C:\\Users\\Samantha\\Downloads\\neat-throne-389400-9879a39cdb8d.json" # 設定Dialogflow專案 ID和語言代碼 project_id = "neat-throne-389400" language_code = "en-Us" #建立一個唯一的sessionID session_id = "1234" #要發送的文字請求 # text - "Hello, how are you?" #text = =“妳好" # text - "classs" text ="你好" def detect_intent(project_id, session_id, text, language_code): session_client = dialogflow_v2.SessionsClient() session = session_client.session_path(project_id, session_id) text_input = dialogflow_v2.types.TextInput(text=text, language_code=language_code) query_input = dialogflow_v2.types.QueryInput(text=text_input) response = session_client.detect_intent(session=session, query_input=query_input) return response.query_result.fulfillment_text #呼叫detect_intent函式,獲取Dialogflow的回應 response = detect_intent(project_id, session_id, text, language_code) #輸出回應 print("Dialogflow Response: ", response) ``` ![](https://hackmd.io/_uploads/S1DHilEDh.jpg) ###### tags: `雲端平台` `GCP` `爬蟲` `python` `jupyter`