# AI(Python)-Copilot ## 本章內容 1. 問題分解 2. 利用由上而下設計分析問題(由下而上攥寫程式碼) 3. 例子 ## 問題分解 & Top-Down - 過程從尚未明確的大問題,將其細分多個"具體"且"明確定義"的子任務,讓每個子任務完成整體的叫局部功能 - 降低複雜度,讓每個函式單純 ## Example - 作者識別任務 文章一: ```text! The old man sat by the riverbank, his fishing rod gently bobbing in the water. The sun was high, but there was a cool breeze that made the heat bearable. He had been coming here for years, alone, casting his line and waiting. Today, the fish were scarce, but it didn’t bother him. The river's quiet, the birds calling in the distance, and the feel of the earth beneath him — these were enough. He didn’t fish for food or for sport anymore. He fished because it gave him peace, a connection to the world that moved without him. ``` 文章二: ```text! The light filtered through the lace curtains, casting delicate patterns on the walls. Clara stood in the room, watching the shadows dance as her thoughts meandered like a river, drifting from one memory to another. She wondered, as she often did, what it meant to be truly alive. Was it in the motion of her body, the beat of her heart, or was it something deeper, something unseen, that lay beneath her consciousness? The world outside continued in its rhythmic hum, yet within, she felt the pull of something else — a longing for clarity, for understanding, that always seemed just out of reach. ``` 任務內容:判斷文章是否來自同個作家 策略: 1. 特徵值 - 特徵值:各種度量資料提取 - 已知特徵簽名:特徵值組合 2. 將已知特徵簽名跟未知特徵簽名比較,算出一個分數後用最近的已知特徵簽名推測書籍作者 ### 程式三階段 1. Input: 提供神秘書籍電子檔 2. Process: 找出神秘書籍特徵簽名,並跟已知的作者特徵簽名進行比較,找出最近的那個 3. Output: 推測作者是誰 ### 實踐process_data make_signature: 決定書籍未知特徵簽名 get_all_signatures: 找出已知作者的特徵簽名 lowest_score: 將未知跟已知進行比較,找出最接近的 ![截圖 2024-10-23 晚上10.05.45](https://hackmd.io/_uploads/Hy3pfFLeyx.png) ### 實踐make_signature - 句字結構相關的特徵: 1. 每句話平均數字作為一個特徵: average_sentence_length 2. 複雜的句子多用短語連接: average_sentence_complexity - 詞彙選擇相關的特徵: 1. 作者平均單字長度: average_word_length 2. 重複使用特定單字: different_to_total 3. 喜歡用不同單字: exactly_once_to_total ![截圖 2024-10-23 晚上10.20.12](https://hackmd.io/_uploads/S1C7LKUgkx.png) clean_word: 1. 負責大小寫轉換 2. 由於不想將標點符號與特殊符號作為字母,所以需要清除 3. 需要計算不同單字除以總單字數 get_sentences: 1. 將句子分割(基於句號/問號/驚嘆號) get_phrases: 1. 將句子分割(基於逗號/分號/冒號) split_string: 1. '.?!':將句子分割 2. ',;:':將短語分割 ![截圖 2024-10-23 晚上11.04.43](https://hackmd.io/_uploads/Byo5e5Il1g.png) ### Copilot實踐 由下而上 - word 是一個字串。 返回一個轉換過的 word 版本,這個版本中所有字母都已經轉換成小寫,並且標點符號已經從字串的兩端去除。內部的標點符號則保持不變。 ```python! import string def clean_word(word): ''' word is a string. Return a version of word in which all letters have been converted to lowercase, and punctuation characters have been stripped from both ends. Inner punctuation is left untouched. >>> clean_word('Pearl!') 'pearl' >>> clean_word('card-board') 'card-board' ''' word = word.lower() word = word.strip(string.punctuation) return word ``` - text 是一個字串。 返回 text 中單詞的平均長度。 不要將空白字詞計算為單詞。 不要包含周圍的標點符號。 ![截圖 2024-10-23 晚上11.09.12](https://hackmd.io/_uploads/rJDobcUgJg.png) 後面都是一樣的... ### 結論 1. 實現大型程式問題,需要先分解成小任務 2. Top-Down是系統性將問題分解成子任務函式的技術 3. 機器學習是從資料中學習並進行預測 4. 監督式學習中,以現有資料及其分類項目的訓練資料。從這些資料中學習,然後對新實體進行預測 5. 實踐Top-Down設計的函式,需要從底層著手 [Book](https://www.manning.com/books/learn-ai-assisted-python-programming) # Optimizing Active Record 1. load_async - 允許將 ActiveRecord 查詢操作以非同步方式執行。它可以讓多個查詢並行運行,從而減少總的查詢等待時間。 ```ruby! users = User.where(active: true) posts = Post.where(published: true) comments = Comment.where(published: true) User Load (1006.0ms) SELECT "users".* FROM "users" ... Comment Load (1003.2ms) SELECT "comments".* FROM "comments" ... Post Load (1041.2ms) SELECT "posts".* FROM "posts" ... Completed 200 OK in 3204ms (Views: 51.2ms ...) # 使用 load_async 非同步執行查詢 users = User.where(active: true).load_async posts = Post.where(published: true).load_async comments = Comment.where(published: true).load_asyn User Load (1008.7ms) ... ASYNC Comment Load (62.1ms) (db time 1013.8ms) ... ASYNC Post Load (0.0ms) (db time 1010.5ms) ... Completed 200 OK in 1085ms (Views: 64.4ms ...) # 當你需要訪問結果時,它會等待線程執行完畢並返回結果 puts users.count puts posts.count ``` Currently load_async does not seem to support eager loaded queries -> [PR](https://github.com/rails/rails/issues/47809) [Link](https://pawelurbanek.com/rails-load-async) 2. Subquery ![截圖 2024-10-23 晚上11.53.52](https://hackmd.io/_uploads/rkg7n9Ux1g.png) In the first case, without subqueries, we are going to the database twice: First to get the average salary, and then again to get the result set. With a subquery, we can avoid the extra roundtrip, getting the result directly with a single query. ```ruby! Driver.where( 'trips_count > (:avg)', avg: Driver.select('ROUND(AVG(trips_count))') .order(trips_count: :desc) .limit(5) ) ``` 可以繼續以Active Record方式操作 [Link](https://pganalyze.com/blog/active-record-subqueries-rails) 3. Using Common Table Expressions(CTE) - When you need to reference a derived table multiple times in a single query. - When you need to organize long and complex queries. ![截圖 2024-10-24 凌晨12.04.32](https://hackmd.io/_uploads/SkGiC5Ixkl.png) ```ruby! Post .with(posts_with_comments: Post.where("comments_count > ?", 0)) .from("posts_with_comments AS posts") .select("posts.id, posts.name") ``` ```sql! WITH posts_with_comments AS ( SELECT * FROM posts WHERE comments_count > 0 ) SELECT posts.id, posts.name FROM posts_with_comments AS posts; ``` open_days_with_places(*campus) [Link](https://blog.kiprosh.com/rails-7-1-construct-cte-using-with-query-method/)