俊瑋
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # AWS Textract 功能實作(第一部分) ### 前置步驟 * 註記 Plugin 的資訊(可自行修改) ```php /** * Plugin Name: Textract * Author: Your_name * Version: 1.0.0 */ ``` <br> * 防止檔案被直接存取 ```php if (!defined('ABSPATH')) { exit; } ``` > <font size="2">'ABSPATH' 是 WordPress 目錄的絕對路徑,若沒有被定義表示該檔案不是在 WordPress 環境中運行</font> <br> * 先引用等等會用到的功能 ```php use Aws\Textract\TextractClient; use Aws\S3\S3Client; ``` <br> * 載入 Composer 的 autoload 檔案 ```php require_once dirname(__DIR__) . '/vendor/autoload.php'; ``` > <font size="2">**\_\_DIR\_\_**: 目前這支 PHP 檔所在的資料夾 > **dirname(\_\_DIR\_\_)**: 目前這支 PHP 檔所在資料夾的上一層資料夾 > **require**: 載入另一個檔案,加上 once 表示如果之前載入過就不再載入一次 > 註: . 在php中可以用於連接字串,例如:"S" . "SR" = "SSR"</font> <br> 開始建立功能吧! -> 把任務拆分成==子任務== ![截圖 2025-05-17 下午3.39.46](https://hackmd.io/_uploads/B1510nH-lg.png) *** ### Function1: 建立上傳表單 > ```php >function textract_upload_form() { > // 建立一個讓使用者可以上傳檔案的表單 >} >``` <br> 首先要設定表單提交的目標網址 ```php $action_url = esc_url(admin_url('admin-post.php')); ``` > <font size="2">**esc_url():** WordPress 的內建函式,用來過濾網址裡面的不安全字符</font> > <font size="2">**admin_url():** WordPress 的內建函式,回傳管理後台的 URL</font> > <font size="2">**admin-post.php:** WordPress 用來處理通過 HTTP POST 提交到管理區請求的文件</font> <br> 接著,回傳一個以 HTML 建立的表單 ```html return '<form action="' . $action_url . '" method="post" enctype="multipart/form-data" onsubmit="showLoading()"> <!-- 表單設定 --> <input type="hidden" name="action" value="textract_upload"> <!-- action值(要回傳給後端的) --> <input type="file" name="textract_file" required> <!-- 上傳的檔案 --> <button type="submit">Upload and Analyze</button> <!-- 上傳按鈕 --> </form> <!-- Loading時顯示的訊息 --> <p id="loadingMessage" style="display:none; color: red;">Processing... Please wait.</p> <script> <!-- Loading時觸發,顯示loadingMessage --> function showLoading() { document.getElementById("loadingMessage").style.display = "block"; } </script>'; ``` :::spoiler (補充)讀入的資料 ```php array( 'name' => 'example.pdf', // 原始檔名 'type' => 'application/pdf', // 檔案類型 'tmp_name' => '/tmp/php8zU8gX', // 臨時檔案的路徑 'error' => 0, // 上傳錯誤代碼(0 代表沒問題) 'size' => 123456 // 檔案大小(bytes) ) ``` ::: <br> > [!Important] 完整程式碼 > > :::spoiler &nbsp;textract_upload_form > ```php= > function textract_upload_form() { > > // 設定表單提交的目標網址 > $action_url = esc_url(admin_url('admin-post.php')); > > // 返回要上傳的表單(HTML) > return '<form action="' . $action_url . '" method="post" enctype="multipart/form-data" onsubmit="showLoading()"> > <input type="hidden" name="action" value="textract_upload"> > <input type="file" name="textract_file" required> > <button type="submit">Upload and Analyze</button> > </form> > <p id="loadingMessage" style="display:none; color: red;">Processing... Please wait.</p> > <script> > function showLoading() { > document.getElementById("loadingMessage").style.display = "block"; > } > </script>'; > > } > ``` > ::: <br> <br> *** ### Function2: 檔案分析 >```php >function textract_upload_and_analyze($file) { > // 傳入檔案$file,透過 Amazon Textract 分析此檔案 >} >``` <br> 首先將地區資訊存進$region變數 ```php $region = 'us-west-2'; ``` <br> 確認傳入的檔案內容不是空的之後,讀入檔案 ```php if (!isset($file['tmp_name']) || empty($file['tmp_name'])) { return "No file uploaded."; // 若為空則回傳錯誤訊息並跳出funtion } $file_data = file_get_contents($file['tmp_name']); ``` <br> 為了使用 Textract 的功能,我們要建立一個客戶端 ```php $textract = new TextractClient([ 'version' => 'latest', 'region' => $region // 若需要Key和Secret也要寫在這裡 ]); ``` > <font size="2">Learner Lab 會提供 Textract 的IAM,因此這裡不需要輸入 Key 和 Secret</font> <br> ==**開始進行分析!**== 這裡呼叫 Textract 的 API 來分析,並把分析結果存進$result ```php $result = $textract->detectDocumentText([ 'Document' => [ 'Bytes' => $file_data, ], ]); ``` 但是 Textract 輸出的結果會長這樣 :::spoiler $result ``` Array ( [Blocks] => Array ( [0] => Array ( [BlockType] => "LINE", [Text] => "Hello World", [Confidence] => 99.0, [Geometry] => Array ( [BoundingBox] => Array ( [Width] => 0.5, [Height] => 0.1, [Left] => 0.1, [Top] => 0.2 ) ) ), [1] => Array ( [BlockType] => "WORD", [Text] => "Hello", [Confidence] => 98.5, [Geometry] => Array ( [BoundingBox] => Array ( [Width] => 0.2, [Height] => 0.1, [Left] => 0.1, [Top] => 0.2 ) ) ), [2] => Array ( [BlockType] => "WORD", [Text] => "World", [Confidence] => 97.5, [Geometry] => Array ( [BoundingBox] => Array ( [Width] => 0.2, [Height] => 0.1, [Left] => 0.3, [Top] => 0.2 ) ) ) [3] => Array( [BlockType] => "TABLE", [Confidence] => 95.0, [Relationships] => Array( [0] => Array( [Type] => "CHILD", [Ids] => Array( "cell-id-1", "cell-id-2", "cell-id-3" ) ) ) ) ) ) ``` ::: &ensp; 因此,我們需要從讀出的這個格式中提取出我們需要的資訊 Blocks是一個包含很多block(讀入的物件)的陣列,我們只需要找出是我們想要屬性的block就好 ```php $extractedText = ""; // 用於存放提取出的結果 foreach ($result["Blocks"] as $block) { // 遍歷Blocks中的所有物件 if ($block["BlockType"] == "LINE") { // 只需要提取屬性為LINE的物件即可 $extractedText .= $block["Text"] . " "; // 將新讀到的內容接在原本的內容後面 } } ``` :::spoiler <font size="2">&nbsp; **Q:** 為什麼只要讀取 LINE 物件,沒有讀取 WORD 物件</font> <font size="2">**A: LINE 物件通常已經包含了 WORD 物件的內容**。舉例來說,圖片中如果有 Hello World,讀到的結果會包含一個 LINE (Hello World) 以及兩個 WORD (Hello、World)</font> ::: <br> <br> 分析完畢後,就可以把結果傳到 Wordpress 了 ```php update_option('textract_last_text', $extractedText); ``` > <font size="2"> **Option:** WordPress 提供存放網站設置數據以及全域變數的機制 </font> > <font size="2"> **update_option('A', B):** 將B存入資料庫中名為'A'的option,若'A'不存在則會自動創建</font> <br> 最後,為了防止一些沒有預料到的錯誤導致程式崩潰,我們需要再加上==錯誤處理== ```php try { ...剛剛的code... return "Success! Text extracted."; // 分析完畢,回傳成功訊息 } catch (Exception $e) { // 錯誤處理 return "Textract error: " . $e->getMessage(); // 若失敗,回傳錯誤訊息 } ``` <br> > [!Important] 完整程式碼 > > :::spoiler &nbsp;textract_upload_and_analyze > ```php= > function textract_upload_and_analyze($file) { > > // 將地區資訊存進$region變數 > $region = 'us-west-2'; > > // 確認傳入的檔案內容是否為空 > if (!isset($file['tmp_name']) || empty($file['tmp_name'])) { > return "No file uploaded."; > } > > // 讀入檔案 > $file_data = file_get_contents($file['tmp_name']); > > // 建立客戶端 > $textract = new TextractClient([ > 'version' => 'latest', > 'region' => $region > ]); > > try { > > // 呼叫Textract的API來分析,並把分析結果存進$result > $result = $textract->detectDocumentText([ > 'Document' => [ > 'Bytes' => $file_data, > ], > ]); > > // 提取出所需的資訊(屬性為LINE的物件) > $extractedText = ""; > foreach ($result["Blocks"] as $block) { > if ($block["BlockType"] == "LINE") { > $extractedText .= $block["Text"] . " "; > } > } > > // 把結果傳到 Wordpress Option > update_option('textract_last_text', $extractedText); > > return "Success! Text extracted."; > > } catch (Exception $e) { // 錯誤處理 > > return "Textract error: " . $e->getMessage(); > > } > } > ``` > ::: <br> <br> *** 恭喜你已經完成這段程式中最複雜的部分(應該吧) 接下來我們就要把一開始表單讀取到的內容丟給剛剛寫好的 function 去做分析 ### Function3: 檔案處理 >```php >function textract_handle_upload() { > // 將表單讀入的檔案給Function2分析 >} >``` <br> 在這裡我們確認檔案沒有問題後,把它傳給 function2 做分析 ```php if ($_SERVER['REQUEST_METHOD'] === 'POST' && isset($_FILES['textract_file'])) { textract_upload_and_analyze($_FILES['textract_file']); // 把檔案傳入用來分析的function } wp_redirect($_SERVER["HTTP_REFERER"]); // 重新導回頁面 exit; // 結束程式 ``` > <font size="2">**'POST'** 為一種 HTTP 的請求方法, [更多請求方法的介紹可以看這裡](https://developer.mozilla.org/zh-TW/docs/Web/HTTP/Reference/Methods)</font> > <font size="2">**$_SERVER["HTTP_REFERER"]** 是一個伺服器變數,用來取得前一個頁面的網址</font> <br> 為了避免使用者上傳的檔案太大導致程式運作太久,我們最好加上==檔案大小的限制== ```php $max_file_size = 5 * 1024 * 1024; // 設定檔案大小上限(5MB) if ($_FILES['textract_file']['size'] > $max_file_size) { // 檢查檔案大小 update_option('textract_last_text', 'Error: File size exceeds the 5MB limit.'); wp_redirect($_SERVER["HTTP_REFERER"]); exit; } ``` :::spoiler <font size="2">&nbsp; **Q:** 為什麼檔案大小是 5 * 1024 * 1024?</font> <font size="2">**A: 檔案大小通常以位元組(bytes)為單位**。 常見的單位 1KB(千位元組)= 1024 bytes。1MB(百萬位元組)= 1024 KB = 1024 × 1024 bytes</font> ::: <br> > [!Important] 完整程式碼 > > :::spoiler &nbsp;textract_handle_upload > ```php= > function textract_handle_upload() { > > // 檢查檔案 > if ($_SERVER['REQUEST_METHOD'] === 'POST' && isset($_FILES['textract_file'])) { > > $max_file_size = 5 * 1024 * 1024; // 設定檔案大小上限 > > if ($_FILES['textract_file']['size'] > $max_file_size) { // 檢查檔案大小 > > update_option('textract_last_text', 'Error: File size exceeds the 5MB limit.'); > wp_redirect($_SERVER["HTTP_REFERER"]); > exit; > > } > > textract_upload_and_analyze($_FILES['textract_file']); // 進行分析 > } > > wp_redirect($_SERVER["HTTP_REFERER"]); // 重新導回頁面 > exit; // 結束程式 > > } > ``` > ::: <br> *** ### Function4: 顯示結果 ```php function textract_display_shortcode() { // 取得分析的結果並顯示在頁面上 } ``` <br> 還記得我們在分析檔案時把結果存在 Wordpress 的 option 嗎 這裡我們要做的就是把結果提取並且顯示出來 ```php $extractedText = get_option('textract_last_text', 'No text extracted yet.'); ``` > <font size="2">**get_option('A', B):** 嘗試取得 option 中名為 'A' 的資料,若不存在則返回 B</font> <br> 接著,回傳一個以 HTML 建立的文字框顯示分析結果 ```html return "<div class='aws-textract-output' style='border: 1px solid #ccc; padding: 10px;'> <!-- 基本設定 --> <strong>Extracted Text:</strong> <!-- 顯示的文字 --> <p>{$extractedText}</p> <!-- 分析結果 --> </div>"; ``` <br> > [!Important] 完整程式碼 > > :::spoiler &nbsp;textract_display_shortcode > ```php= > function textract_display_shortcode() { > > $extractedText = get_option('textract_last_text', 'No text extracted yet.'); > > return "<div class='aws-textract-output' style='border: 1px solid #ccc; padding: 10px;'> > <strong>Extracted Text:</strong> > <p>{$extractedText}</p> > </div>"; > > } > ``` > ::: <br> *** ### 程式整合 以上建立的四個 function 可以分為兩個種類 **1. 回傳 HTML 的 function:** 我們要建立 ==Shortcode== 讓 Wordpress 能夠呼叫這些並且在網頁中顯示 ```php add_shortcode('textract_upload_form', 'textract_upload_form'); // function1 add_shortcode('textract_result', 'textract_display_shortcode'); // function4 ``` <br> **2. 處理檔案的 function:** 這些 function 需要透過呼叫以執行,由於 function2 會在 function3 中被呼叫,我們只要想辦法呼叫 function3 <br> 透過 ==add_action==,讓 textract_upload 的值被改變時,執行 textract_handle_upload(也就是 function3) ```php add_action('admin_post_textract_upload', 'textract_handle_upload'); add_action('admin_post_nopriv_textract_upload', 'textract_handle_upload'); ``` > <font size="2">**admin_post_{action名稱}:** 當 WordPress 收到已登入者的 POST 請求時會觸發的 hook > **admin_post_nopriv_{action名稱}:** 當 WordPress 收到未登入者的 POST 請求時會觸發的 hook</font> <br> 整合完畢後,我們就完成了一個簡單的程式了~ ### 完整程式碼 :::spoiler &nbsp;Textract.php ```php= <?php /** * Plugin Name: Textract * Author: Your_name * Version: 1.0.0 */ // 防止檔案被直接存取 if (!defined('ABSPATH')) { exit; } // Load AWS SDK require_once dirname(__DIR__) . '/vendor/autoload.php'; // 引用等等會用到的功能 use Aws\Textract\TextractClient; use Aws\S3\S3Client; // function1: 建立上傳表單 function textract_upload_form() { // 設定表單提交的目標網址 $action_url = esc_url(admin_url('admin-post.php')); // 返回要上傳的表單(HTML) return '<form action="' . $action_url . '" method="post" enctype="multipart/form-data" onsubmit="showLoading()"> <input type="hidden" name="action" value="textract_upload"> <input type="file" name="textract_file" required> <button type="submit">Upload and Analyze</button> </form> <p id="loadingMessage" style="display:none; color: red;">Processing... Please wait.</p> <script> function showLoading() { document.getElementById("loadingMessage").style.display = "block"; } </script>'; } // function2: 檔案分析 function textract_upload_and_analyze($file) { // 將地區資訊存進$region變數 $region = 'us-west-2'; // 確認傳入的檔案內容是否為空 if (!isset($file['tmp_name']) || empty($file['tmp_name'])) { return "No file uploaded."; } // 讀入檔案 $file_data = file_get_contents($file['tmp_name']); // 建立客戶端 $textract = new TextractClient([ 'version' => 'latest', 'region' => $region ]); try { // 呼叫Textract的API來分析,並把分析結果存進$result $result = $textract->detectDocumentText([ 'Document' => [ 'Bytes' => $file_data, ], ]); // 提取出所需的資訊(屬性為LINE的物件) $extractedText = ""; foreach ($result["Blocks"] as $block) { if ($block["BlockType"] == "LINE") { $extractedText .= $block["Text"] . " "; } } // 把結果傳到 Wordpress Option update_option('textract_last_text', $extractedText); return "Success! Text extracted."; } catch (Exception $e) { // 錯誤處理 return "Textract error: " . $e->getMessage(); } } // function3: 檔案處理 function textract_handle_upload() { // 檢查檔案 if ($_SERVER['REQUEST_METHOD'] === 'POST' && isset($_FILES['textract_file'])) { $max_file_size = 5 * 1024 * 1024; // 設定檔案大小上限 if ($_FILES['textract_file']['size'] > $max_file_size) { // 檢查檔案大小 update_option('textract_last_text', 'Error: File size exceeds the 5MB limit.'); wp_redirect($_SERVER["HTTP_REFERER"]); exit; } textract_upload_and_analyze($_FILES['textract_file']); // 進行分析 } wp_redirect($_SERVER["HTTP_REFERER"]); // 重新導回頁面 exit; // 結束程式 } // function4: 顯示結果 function textract_display_shortcode() { // 取得分析結果 $extractedText = get_option('textract_last_text', 'No text extracted yet.'); // 返回要上傳的文字框(HTML) return "<div class='aws-textract-output' style='border: 1px solid #ccc; padding: 10px;'> <strong>Extracted Text:</strong> <p>{$extractedText}</p> </div>"; } add_shortcode('textract_upload_form', 'textract_upload_form'); add_shortcode('textract_result', 'textract_display_shortcode'); add_action('admin_post_textract_upload', 'textract_handle_upload'); add_action('admin_post_nopriv_textract_upload', 'textract_handle_upload'); ?> ``` ::: <br> *** <br> # AWS Textract 功能實作(第二部分) ### 清除表單內容 > 完成基本的 Textract 功能會發現,前一次分析的結果會一直留在文字框中 > ![截圖 2025-04-28 上午10.33.34](https://hackmd.io/_uploads/rJZFtDhyle.png =80%x) ```php function clear_textract_results() { // 清除之前分析的結果 } ``` 清除結果的方式很簡單,就只是把儲存結果的變數更新為預設的訊息 ```php update_option('textract_last_text', 'No text extracted yet.'); ``` <br> > [!Important] 完整程式碼 > > :::spoiler &nbsp;clear_textract_results > ```php= > function clear_textract_results() { > > update_option('textract_last_text', 'No text extracted yet.'); > >} > ``` > ::: <br> 那麼要如何呼叫這個 function 呢? 我們透過上面用過的==add_action==,讓每次重新載入頁面時執行這個功能 ```php add_action('wp_head', 'clear_textract_results'); ``` > <font size="2">**wp_head:** WordPress 在 <head> 載入時會觸發的 hook</font> <br> ### 檔案上傳至S3 > 因為有些檔案的分析方法僅限從 S3 讀入資料(如PDF),我們在進行分析之前要先把上傳的檔案存在 S3 ![截圖 2025-05-17 下午3.40.33](https://hackmd.io/_uploads/SJHD1aBZee.png) ```php function upload_to_s3($file_data, $bucket, $key, $region) { // 將檔案上傳至S3 } ``` 首先我們先建立一個==S3 儲存貯體==,建立方法請參照附件[(點我前往)](https://hackmd.io/j3hU2w1PSDOzrXGub2falQ?both#附錄:Amazon-S3-儲存貯體建立方法) <br> 回到php檔,為了使用 S3 我們需要建立一個客戶端 ```php $s3 = new S3Client([ 'version' => 'latest', 'region' => $region, // 若需要Key和Secret也要寫在這裡 ]); ``` > <font size="2">Learner Lab 會提供 S3 的IAM,因此這裡不需要輸入 Key 和 Secret</font> <br> 透過==putObject==把檔案傳到指定的路徑 ```php $result = $s3->putObject([ 'Bucket' => $bucket, // S3中的bucket名稱 'Key' => $key, // 檔案儲存的路徑 'Body' => $file_data, // 上傳的檔案 ]); ``` 接著加上錯誤處理 ```php try { $result = $s3->putObject([ 'Bucket' => $bucket, 'Key' => $key, 'Body' => $file_data, ]); } catch (Exception $e) { // 錯誤處理 return "Error uploading to S3: " . $e->getMessage(); } ``` 最後若是上傳成功,要傳回 S3 上檔案的公開網址,讓我們之後能透過它來存取檔案 ```php return $result['ObjectURL']; ``` > <font size="2">**ObjectURL:** 檔案在 S3 上的公開網址</font> <br> > [!Important] 完整程式碼 > > :::spoiler &nbsp;upload_to_s3 > ```php= > function upload_to_s3($file_data, $bucket, $key, $region) { > > // 建立客戶端 > $s3 = new S3Client([ > 'version' => 'latest', > 'region' => $region, > ]); > > try { > > // 將檔案存到S3的指定路徑 > $result = $s3->putObject([ > 'Bucket' => $bucket, > 'Key' => $key, > 'Body' => $file_data, > ]); > > } catch (Exception $e) { // 錯誤處理 > > return "Error uploading to S3: " . $e->getMessage(); > > } > > // 返回公開網址 > return $result['ObjectURL']; >} > ``` > ::: <br> ### 分析PDF > 改良分析檔案的 function2,分析從 S3 讀取的檔案並支援 PDF ```php function textract_upload_and_analyze($file) { // 將傳入的檔案 $file 儲存到 S3 // 從 S3 存取檔案並透過 Amazon Textract 分析此檔案 } ``` <br> 新增==bucket==(S3中的儲存空間)、==key==(儲存路徑)、==fileType==(檔案類型)變數 ```php $bucket = "這裡輸入你的 S3 bucket 名稱"; // 將bucket名稱存入變數 $key = "uploads/" . basename($file['name']); // 將檔案路徑存入變數 $fileType = mime_content_type($file['tmp_name']); // 將檔案的類型存入變數 ``` > <font size="2">**basename:** PHP 的一個內建函式,用來從完整的檔案路徑中取出「檔名」 **mime_content_type:** PHP 的一個內建函式,用來偵測其檔案類型,返回的檔案類型可以參考 [這裡](https://metadata.teldap.tw/elearning/doc/MIME_Type.pdf) </font> <br> 呼叫前面寫的 function,將檔案上傳到 S3 ```php $s3_url = upload_to_s3($file_data, $bucket, $key, $region); ``` <br> 接著要開始進行分析了 我們先處理檔案為==pdf 檔==的情況 ```php if ($fileType == "application/pdf") { // pdf檔的處理方法 } ``` 呼叫 Textract 的 API 來進行分析 這裡用的 API 和實作第一部分時使用的不同,因為後者不支援 PDF 檔 ```php $result = $textract->startDocumentTextDetection([ 'DocumentLocation' => [ 'S3Object' => [ 'Bucket' => $bucket, 'Name' => $key, ], ], ]); ``` <br> 由於這裡分析需要花費較多時間,我們要==每隔一段時間去確認是否完成== 若分析狀態為 =='IN_PROGRESS'== 就繼續檢查分析狀態 ```php $jobStatus = ''; // 用於儲存檔案的分析狀態的變數 do { sleep(1); // 每一秒鐘確認一次 // 呼叫 API $statusResult = $textract->getDocumentTextDetection([ 'JobId' => $result['JobId'], ]); $jobStatus = $statusResult['JobStatus']; // 把分析狀態儲存起來 } while ($jobStatus == 'IN_PROGRESS'); // 若檔案狀態為'IN_PROGRESS',繼續執行 ``` > <font size="2">**sleep(n):** 程式延遲執行 n 秒</font> <br> 當分析狀態不再是 'IN_PROGRESS' 就會跳出上面的迴圈,此時可能有兩種狀態: * 分析狀態為 =='SUCCEEDED'== 表示分析完成 我們就可以把分析的資料提取出來(提取方式和第一部分介紹的一樣) ```php if ($jobStatus == 'SUCCEEDED') { $extractedText = ''; // 用於存放提取出的結果 // 提取出所需的資訊(屬性為LINE的物件) foreach ($statusResult["Blocks"] as $block) { if ($block["BlockType"] == "LINE") { $extractedText .= $block["Text"] . " "; } } // 把結果存到 Wordpress Option update_option('textract_last_text', $extractedText); // 回傳成功訊息 return "Success! Text extracted from PDF."; } ``` * 若分析狀態為其他表示分析失敗 我們要把分析的結果改成錯誤訊息 ```php else { // 把分析的結果改成錯誤訊息 update_option('textract_last_text', 'Failed with status ' . $jobStatus); // 回傳錯誤訊息 return "Textract job failed with status: " . $jobStatus } ``` <br> 完成 PDF 檔的處理方式後,接下來要寫的是其他檔案的處理方式 ```php if ($fileType == "application/pdf") { ...剛剛寫的PDF處理方式... } else { // 其他類型檔案的處理方式 } ``` <br> 這裡呼叫 Textract API 的方式和第一部分介紹的一樣,差別在於這裡是從 S3 中讀取檔案 ```php $result = $textract->detectDocumentText([ 'Document' => [ // 讀取S3裡的檔案(和第一部分不同) 'S3Object' => [ 'Bucket' => $bucket, 'Name' => $key, ], ], ]); ``` 提取、傳回分析結果的方式也和第一部分一樣 ```php $extractedText = ""; // 用於存放提取出的結果 // 提取出所需的資訊(屬性為LINE的物件) foreach ($result["Blocks"] as $block) { if ($block["BlockType"] == "LINE") { $extractedText .= $block["Text"] . " "; } } // 把結果存到 Wordpress Option update_option('textract_last_text', $extractedText); // 回傳成功訊息 return "Success! Text extracted."; ``` <br> 最後再加上錯誤處理就完成了~ ```php try { ...剛剛寫的全部東西... } catch (Exception $e) { // 錯誤處理 return "Textract error: " . $e->getMessage(); } ``` <br> > [!Important] 完整程式碼 > > :::spoiler &nbsp;textract_upload_and_analyze > ```php= > function textract_upload_and_analyze($file) { > > $region = 'us-west-2'; // 地區 > > // 確認傳入的檔案內容是否為空 > if (!isset($file['tmp_name']) || empty($file['tmp_name'])) { > return "No file uploaded."; > } > > $bucket = "這裡輸入你的 S3 bucket 名稱" ; // bucket名稱 > $key = "uploads/" . basename($file['name']); // S3檔案路徑 > $fileType = mime_content_type($file['tmp_name']); // 檔案類型 > > // 讀入檔案 > $file_data = file_get_contents($file['tmp_name']); > > // 上傳檔案到S3 > $s3_url = upload_to_s3($file_data, $bucket, $key, $region); > > // 建立Textract客戶端 > $textract = new TextractClient([ > 'version' => 'latest', > 'region' => $region > ]); > > try { > > // 若檔案為PDF的分析方式 > if ($fileType == "application/pdf") { > > // 呼叫API進行分析 > $result = $textract->startDocumentTextDetection([ > 'DocumentLocation' => [ > 'S3Object' => [ > 'Bucket' => $bucket, > 'Name' => $key, > ], > ], > ]); > > // 確認分析狀態 > $jobStatus = ''; > do { > > sleep(1); // 每一秒鐘檢查一次 > $statusResult = $textract->getDocumentTextDetection([ > 'JobId' => $result['JobId'], > ]); > $jobStatus = $statusResult['JobStatus']; > > } while ($jobStatus == 'IN_PROGRESS'); > > // 分析結束,若成功則提取分析結果並回傳 > if ($jobStatus == 'SUCCEEDED') { > > $extractedText = ''; // 用於儲存檔案的分析狀態的變數 > > // 提取出所需的資訊(屬性為LINE的物件) > foreach ($statusResult["Blocks"] as $block) { > if ($block["BlockType"] == "LINE") { > $extractedText .= $block["Text"] . " "; > } > } > > // 把結果存到 Wordpress Option > update_option('textract_last_text', $extractedText); > > // 回傳成功訊息 > return "Success! Text extracted from PDF."; > > } else { // 若分析沒有成功,回傳錯誤訊息 > > update_option('textract_last_text', 'Failed with status ' . $jobStatus); > return "Textract job failed with status: " . $jobStatus; > > } > > } else { // 其他檔案類型的分析方式 > > // 呼叫Textract的API來分析,並把分析結果存進$result > $result = $textract->detectDocumentText([ > 'Document' => [ > 'S3Object' => [ > 'Bucket' => $bucket, > 'Name' => $key, > ], > ], > ]); > > $extractedText = ""; // 用於儲存檔案的分析狀態的變數 > > // 提取出所需的資訊(屬性為LINE的物件) > foreach ($result["Blocks"] as $block) { > if ($block["BlockType"] == "LINE") { > $extractedText .= $block["Text"] . " "; > } > } > > // 把結果存到 Wordpress Option > update_option('textract_last_text', $extractedText); > > // 回傳成功訊息 > return "Success! Text extracted."; > } > > } catch (Exception $e) { // 錯誤處理 > > return "Textract error: " . $e->getMessage(); > > } > } > ``` > ::: <br> *** ### 完整程式碼 :::spoiler &nbsp; Textract.php ```php= <?php /** * Plugin Name: Textract * Author: Your_name * Version: 2.0.0 */ // 防止檔案被直接存取 if (!defined('ABSPATH')) { exit; } // Load AWS SDK require_once dirname(__DIR__) . '/vendor/autoload.php'; // 引用等等會用到的功能 use Aws\Textract\TextractClient; use Aws\S3\S3Client; // 檔案上傳至S3 function upload_to_s3($file_data, $bucket, $key, $region) { // 建立客戶端 $s3 = new S3Client([ 'version' => 'latest', 'region' => $region, ]); try { // 將檔案存到S3的指定路徑 $result = $s3->putObject([ 'Bucket' => $bucket, 'Key' => $key, 'Body' => $file_data, ]); } catch (Exception $e) { // 錯誤處理 return "Error uploading to S3: " . $e->getMessage(); } // 返回公開網址 return $result['ObjectURL']; } // function1: 建立上傳表單 function textract_upload_form() { // 設定表單提交的目標網址 $action_url = esc_url(admin_url('admin-post.php')); // 返回要上傳的表單(HTML) return '<form action="' . $action_url . '" method="post" enctype="multipart/form-data" onsubmit="showLoading()"> <input type="hidden" name="action" value="textract_upload"> <input type="file" name="textract_file" required> <button type="submit">Upload and Analyze</button> </form> <p id="loadingMessage" style="display:none; color: red;">Processing... Please wait.</p> <script> function showLoading() { document.getElementById("loadingMessage").style.display = "block"; } </script>'; } // function2: 檔案分析(改良版) function textract_upload_and_analyze($file) { $region = 'us-west-2'; // 地區 // 確認傳入的檔案內容是否為空 if (!isset($file['tmp_name']) || empty($file['tmp_name'])) { return "No file uploaded."; } $bucket = "textract-bucket-name" ; // bucket名稱 $key = "uploads/" . basename($file['name']); // S3檔案路徑 $fileType = mime_content_type($file['tmp_name']); // 檔案類型 // 讀入檔案 $file_data = file_get_contents($file['tmp_name']); // 上傳檔案到S3 $s3_url = upload_to_s3($file_data, $bucket, $key, $region); // 建立Textract客戶端 $textract = new TextractClient([ 'version' => 'latest', 'region' => $region ]); try { // 若檔案為PDF的分析方式 if ($fileType == "application/pdf") { // 呼叫API進行分析 $result = $textract->startDocumentTextDetection([ 'DocumentLocation' => [ 'S3Object' => [ 'Bucket' => $bucket, 'Name' => $key, ], ], ]); // 確認分析狀態 $jobStatus = ''; do { sleep(1); // 每一秒鐘檢查一次 $statusResult = $textract->getDocumentTextDetection([ 'JobId' => $result['JobId'], ]); $jobStatus = $statusResult['JobStatus']; } while ($jobStatus == 'IN_PROGRESS'); // 分析結束,若成功則提取分析結果並回傳 if ($jobStatus == 'SUCCEEDED') { $extractedText = ''; // 用於儲存檔案的分析狀態的變數 // 提取出所需的資訊(屬性為LINE的物件) foreach ($statusResult["Blocks"] as $block) { if ($block["BlockType"] == "LINE") { $extractedText .= $block["Text"] . " "; } } // 把結果存到 Wordpress Option update_option('textract_last_text', $extractedText); // 回傳成功訊息 return "Success! Text extracted from PDF."; } else { // 若分析沒有成功,回傳錯誤訊息 update_option('textract_last_text', 'Failed with status ' . $jobStatus); return "Textract job failed with status: " . $jobStatus; } } else { // 其他檔案類型的分析方式 // 呼叫Textract的API來分析,並把分析結果存進$result $result = $textract->detectDocumentText([ 'Document' => [ 'S3Object' => [ 'Bucket' => $bucket, 'Name' => $key, ], ], ]); $extractedText = ""; // 用於儲存檔案的分析狀態的變數 // 提取出所需的資訊(屬性為LINE的物件) foreach ($result["Blocks"] as $block) { if ($block["BlockType"] == "LINE") { $extractedText .= $block["Text"] . " "; } } // 把結果存到 Wordpress Option update_option('textract_last_text', $extractedText); // 回傳成功訊息 return "Success! Text extracted."; } } catch (Exception $e) { // 錯誤處理 return "Textract error: " . $e->getMessage(); } } // function3: 檔案處理 function textract_handle_upload() { // 檢查檔案 if ($_SERVER['REQUEST_METHOD'] === 'POST' && isset($_FILES['textract_file'])) { $max_file_size = 5 * 1024 * 1024; // 設定檔案大小上限 if ($_FILES['textract_file']['size'] > $max_file_size) { // 檢查檔案大小 update_option('textract_last_text', 'Error: File size exceeds the 5MB limit.'); wp_redirect($_SERVER["HTTP_REFERER"]); exit; } textract_upload_and_analyze($_FILES['textract_file']); // 進行分析 } wp_redirect($_SERVER["HTTP_REFERER"]); // 重新導回頁面 exit; // 結束程式 } // function4: 顯示結果 function textract_display_shortcode() { // 取得分析結果 $extractedText = get_option('textract_last_text', 'No text extracted yet.'); // 返回要上傳的文字框(HTML) return "<div class='aws-textract-output' style='border: 1px solid #ccc; padding: 10px;'> <strong>Extracted Text:</strong> <p>{$extractedText}</p> </div>"; } // 清除表單內容 function clear_textract_results() { update_option('textract_last_text', 'No text extracted yet.'); } add_shortcode('textract_upload_form', 'textract_upload_form'); add_shortcode('textract_result', 'textract_display_shortcode'); add_action('admin_post_textract_upload', 'textract_handle_upload'); add_action('admin_post_nopriv_textract_upload', 'textract_handle_upload'); add_action('wp_head', 'clear_textract_results'); ?> ``` ::: <br> *** <br> # 附錄:Amazon S3 儲存貯體建立方法 <br> 1. **點選AWS主頁面左上角的服務,選擇「儲存」,就可以在項目中找到S3** ![截圖 2025-03-29 下午4.05.29](https://hackmd.io/_uploads/BkDWiXSpyl.png) <br> 2. **進到S3頁面後,點選「建立儲存貯體」** ![截圖 2025-03-29 下午3.46.21](https://hackmd.io/_uploads/SkxpjQS6Jg.png) <br> 3. **輸入儲存貯體名稱** ![截圖 2025-04-28 上午11.01.06](https://hackmd.io/_uploads/SkCokdnJlx.png) <br> 4. **其他設定皆與預設相同即可,接著按下「建立儲存貯體」** ![截圖 2025-03-29 下午3.47.12](https://hackmd.io/_uploads/SyLon7BpJg.png) <br> 5. **建立完成~** ![截圖 2025-03-29 下午3.47.29](https://hackmd.io/_uploads/SkQDF7Bp1l.png) > &nbsp; > 點選儲存貯體的名稱可以檢視目前儲存在裡面的物件(新建立時為空) ![截圖 2025-03-29 下午3.47.58](https://hackmd.io/_uploads/BymPF7HTJx.png)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully