## GCP Vertex-AI Prediction (2024/12/06) ### 實驗架構圖 + 架構圖 ![image](https://hackmd.io/_uploads/HJO6fOgEyl.png) + 輔助說明 - 情境(使用者流程 & 管理員流程 關係示意) ![image](https://hackmd.io/_uploads/Hkr37abEyl.png =70%x) * 使用者觸發Cloud Function(將照片傳到雲端) * Cloud Function會將觸發原因(使用者上傳的照片)傳給Manager佈署的AI * Manager佈署的AI會回傳結果(照片辨識結果的記憶體位置)給cloud function * Cloud Function會回傳結果(處理後的照片辨識結果)給使用者 - Manager流程(佈署於美國) ![image](https://hackmd.io/_uploads/BJ1Vga-VJe.png) * 這個流程屬於"部屬一個集中式的AI" * Manager透過`Vertex-AI` : 部屬AI模型 + Jupyter-Note進行訓練 * Netbook則是一個Google事先幫Manager建立好的python環境,可以方便的訓練模型。 - User流程(佈署於台灣) ![image](https://hackmd.io/_uploads/SyZeWaZNyx.png) * 這個流程屬於"部屬分散式的Cloud Function" * step1. 使用者將資料放入`Storage(In)` * step2. `Storage(In)`的`資料放入`事件,會觸發cloud function * step3. function為避免傳送給AI的請求過大,會將資料轉換格式(Base64編碼) * step4. function透過EndPoint(可以理解決跟AI模型溝通的介面),將資料交給AI進行Predict * step5. function將Predict完的結果儲存到`Storage(Out)` ### Vertex-AI Building & Testing + step1. 打開IAM權限 ![image](https://hackmd.io/_uploads/Hk3-4deEkx.png) + step2. 創建+彙整 資料集 * 輸入以下CMD創建bucket,把Google提供的訓練集放入 ![image](https://hackmd.io/_uploads/r1OKV_xNye.png) ```bash $ export PROJECT_ID=$DEVSHELL_PROJECT_ID $ export BUCKET=[取一個bucket名稱] $ gsutil mb -p $PROJECT_ID -c standard -l [使用者決定的Region] gs://${BUCKET} $ gsutil -m cp -r gs://car_damage_lab_images/* gs://${BUCKET} ``` * 輸入以下CMD過濾出CSV訓練檔案 ![image](https://hackmd.io/_uploads/r1OKV_xNye.png) ```bash $ gsutil cp gs://car_damage_lab_metadata/data.csv . $ sed -i -e "s/car_damage_lab_images/${BUCKET}/g" ./data.csv $ gsutil cp ./data.csv gs://${BUCKET} ``` * 該Bucket內部資料 ![image](https://hackmd.io/_uploads/B1j2HugNJl.png) * 在Vertex創建訓練資料 ![image](https://hackmd.io/_uploads/rJtbIOlNJl.png) ![image](https://hackmd.io/_uploads/SytfLdl4kx.png) ![image](https://hackmd.io/_uploads/S1kv8dgN1x.png) + step3. 開始訓練 ![image](https://hackmd.io/_uploads/SkpQv_eNke.png) - step1. ![image](https://hackmd.io/_uploads/r1Qrvde4kg.png =40%x) ![image](https://hackmd.io/_uploads/BJFSPOe4ye.png) - step2. ![image](https://hackmd.io/_uploads/SykLw_gVye.png =40%x) ![image](https://hackmd.io/_uploads/B1mUPdxV1l.png) - step3. ![image](https://hackmd.io/_uploads/BJuLDdeE1e.png =40%x) ![image](https://hackmd.io/_uploads/Hyh8Dux4ye.png) - step4. ![image](https://hackmd.io/_uploads/B1-Pw_gN1l.png =40%x) ![image](https://hackmd.io/_uploads/HyKwvuxEJg.png) - 訓練完成(過程大概花費1小時40分鐘) + step4. 測試模型 - ![image](https://hackmd.io/_uploads/rJOeSFe4kg.png =40%x) ![image](https://hackmd.io/_uploads/H1UWBFlNJe.png =50%x) - 點選進入你創建的AI模型 - ![image](https://hackmd.io/_uploads/S1sMHKeE1e.png =60%x) ![image](https://hackmd.io/_uploads/HkImHFlNJg.png) ![image](https://hackmd.io/_uploads/Syomrte41g.png) ### Build EndPoint and Storages + 創建Storages(In/Out) ![image](https://hackmd.io/_uploads/rJGi3KxEJl.png) ```bash $ export PROJECT_ID=$DEVSHELL_PROJECT_ID $ export BUCKET=ai-input-storage $ gsutil mb -p $PROJECT_ID -c standard -l asia-east1 gs://${BUCKET} Creating gs://ai-input-storage/... $ export BUCKET=ai-output-storage $ gsutil mb -p $PROJECT_ID -c standard -l asia-east1 gs://${BUCKET} Creating gs://ai-output-storage/... ``` + 輸入以下CMD創建AI-EndPoint * 創建 ```bash $ gcloud ai endpoints create \ --region=us-central1 \ --display-name="[使用者決定名稱]" ``` * 部屬AI模型 ```bash # 可以使用以下指令查詢endpoint跟AI-model的ID $ gcloud ai endpoints list --region=[endpoint的region] \ --filter="display_name=[endpoint的名稱]" \ --format="table(name)" $ gcloud ai models list --region=[AI-model的region] \ --filter="display_name=[AI-model的名稱]" \ --format="table(name)" ``` ```bash $ gcloud ai endpoints deploy-model [endpoint的ID] \ --region=[跟AI-model同一個region] \ --model=[AI-model的ID] \ --display-name="[使用者決定的名稱]" \ --machine-type="[使用者決定硬體規格(例如n1-standard-2)"] \ --min-replica-count=[使用者可以決定最低會有多少個model同時服務(我設定為1)] \ --max-replica-count=[使用者可以決定最多會有多少個model同時服務(我設定為1)] ``` * 檢查 ```bash $ gcloud ai endpoints describe [endpoint id] --region=[endpoint region] $ gcloud ai models describe [AI-model的ID] --region=[AI-model的region] ``` ### Build the cloud function + Cloud function基本配置 ![image](https://hackmd.io/_uploads/rkLiz9g4Jg.png =60%x) - 當使用者傳照片到Storage(On)時才觸發,所以要選Storage(On)的名稱 ![image](https://hackmd.io/_uploads/SJR2zcgEkg.png) - 為了避免使用者傳輸檔案過大 + 使用者一次傳輸大量照片 : 需要額外設置Memory大小(我測試時有幾次因為照片問題導致Memory不夠用) ![image](https://hackmd.io/_uploads/rJhYm9lEJg.png) - Memory不夠用的錯誤Log消息 + function內部程式碼配置![image](https://hackmd.io/_uploads/ByvyE5eVJx.png =50%x) - 語言與版本 ![image](https://hackmd.io/_uploads/rkUxNceNke.png) - main ```python=1 import base64 import json import traceback from google.cloud import aiplatform_v1 as ai from google.cloud import storage from google.protobuf.json_format import MessageToDict def image_to_base64(image_data): """ 將圖片二進制數據轉換為 Base64 編碼 """ try: base64_string = base64.b64encode(image_data).decode("utf-8") return base64_string except Exception as e: print(f"Error encoding image to Base64: {e}") return None def process_prediction(predictions): """ 將 predictions 結果轉換為 JSON 格式 """ try: # 如果是字典型資料,直接處理為 JSON 格式 if isinstance(predictions, dict): return json.dumps(predictions, indent=2) # 如果是列表型資料,將每個元素處理為字典後序列化 elif isinstance(predictions, list): return json.dumps([dict(p) if isinstance(p, dict) else str(p) for p in predictions], indent=2) else: # 無法識別的類型,轉換為字串 return json.dumps({"error": "Unsupported prediction format", "data": str(predictions)}, indent=2) except Exception as e: print(f"Error processing predictions: {e}") traceback.print_exc() return json.dumps({"error": "Failed to process predictions", "details": str(e)}, indent=2) def save_to_gcs(bucket_name, destination_blob_name, content): """ 將文字內容保存到指定的 GCS bucket """ try: storage_client = storage.Client() bucket = storage_client.bucket(bucket_name) blob = bucket.blob(destination_blob_name) blob.upload_from_string(content) print(f"Saved result to GCS: {bucket_name}/{destination_blob_name}") except Exception as e: print(f"Error saving to GCS: {e}") def prediction_job(event, context): project_id = "marktest1" # 替換為你的 GCP 專案 ID location = "us-central1" # 替換為你的地區 api_endpoint = f"{location}-aiplatform.googleapis.com" endpoint_id = "7167559071308972032" # 替換為你的 endpoint ID model_id = "6680690924483248128" # 替換為你的 AI-model ID model_name = f"projects/{project_id}/locations/{location}/models/{model_id}" src_bucket_name = event["bucket"] # 來源 Bucket 名稱 src_file_name = event["name"] # 來源 Bucket 的檔案名稱 dest_bucket_name = "ai-output-storage" # 替換為目標 Bucket 名稱 dest_file_name = f"predictions/{src_file_name}.txt" # 保存結果的檔案名稱 client_options = {"api_endpoint": api_endpoint} client = ai.PredictionServiceClient(client_options=client_options) try: # 從 GCS 下載圖片 storage_client = storage.Client() bucket = storage_client.bucket(src_bucket_name) blob = bucket.blob(src_file_name) image_data = blob.download_as_bytes() # 將圖片轉換為 Base64 encoded_image = image_to_base64(image_data) if not encoded_image: raise ValueError("Failed to encode image to Base64") print("Finished Img Convert") # 構造 Vertex AI 請求 instance = { "content": encoded_image # 使用 Base64 編碼 } endpoint = f"projects/{project_id}/locations/{location}/endpoints/{endpoint_id}" # 預測 response = client.predict(endpoint=endpoint, instances=[instance]) # 處理 predictions 結果 predictions = MessageToDict(response._pb) print(predictions) print("Finished prediction") # 處理 predictions 為可序列化的格式 processed_predictions = process_prediction(predictions) print(processed_predictions , "\n") # 將結果保存到 GCS save_to_gcs(dest_bucket_name, dest_file_name, processed_predictions) print("Finished saving predictions") except Exception as e: print(f"Error in prediction job: {e}") traceback.print_exc() ``` - requirement ```requirement # Function dependencies, for example: # package>=version google-cloud-aiplatform==1.34.0 google-cloud-storage==2.12.0 protobuf>=3.20.0,<4.0.0 ``` ### 實驗結果圖 + 使用者傳輸照片![image](https://hackmd.io/_uploads/SkeWS5xEyx.png =50%x) ![image](https://hackmd.io/_uploads/BJrWr9lEyx.png =70%x) + Storage(Out)的結果![image](https://hackmd.io/_uploads/Bk4BH9e4kg.png =50%x) - Storage內部的確產生對應的preediction ![image](https://hackmd.io/_uploads/SkZDH9g4Jg.png) - 查看prediction結果 ![image](https://hackmd.io/_uploads/SytuS9eNye.png) ![image](https://hackmd.io/_uploads/SkTdSqg4Jl.png) ### 結論 + 簡單的建立Vertex AI以及對應的訓練方式 + Cloud Function是如何呼叫AI執行prediction的程式碼實現 + 要注意Vertex的造價,真的很高。 ### Reference + [Vertex 創建參考](https://blog.cloud-ace.tw/ai-machine-learning/vertex-ai-tutorial-and-intro/) + [Vertex SDK 參考](https://medium.com/@shivsaxena56/automate-vertex-ai-forecast-batch-prediction-using-cloud-function-82a098c03213) + [Vertex HTTPs request format 參考](https://cloud.google.com/dialogflow/es/docs/reference/rest/v2/DetectIntentResponse#QueryResult) + [Vertex Google 技術文件](https://github.com/rominirani/genai-apptemplates-googlecloud) + [Python Google SDK src code](https://cloud.google.com/python/docs/reference)