AI工具探討

## AI工具使用建議 ### AI可以寫那些程式語言? 生成式AI可以生成多種程式語言的程式碼，但其能力取決於訓練數據和模型的設計。例如，GPT-4經過訓練後，可以生成 C、C++、Python、C#、JavaScript、HTML、CSS等常見的程式語言。 ### 評分機制 >本文參考 [LMSYS Chatbot Arena 網站](https://chat.lmsys.org/?leaderboard)。此網站由 LMSYS 與 UC Berkeley SkyLab 開發與維護。 #### Arena Score arena score 是一個藉由 crowdsourced(群眾外包) 的隨機平台，超過一百萬票的民眾投票，再透過 elo rating 所得出來的分數。 >[Elo 評級系統](https://en.wikipedia.org/wiki/Elo_rating_system)是一種用於評估玩家相對技能水平的排名方法。其特色在於每次賽後調整的分數大小取決於與對手的評分差距，擊敗高評分的對手會獲得更多分數，而擊敗低評分的對手則獲得較少分數，反之亦然。此系統會盡可能地準確反映其真實水平，常被應用在棋類遊戲的評分系統上。 - Overall ![image](https://hackmd.io/_uploads/r1xQSwgMdC.png) - Coding ![image](https://hackmd.io/_uploads/Hy4dvlMdR.png) - 需付費的模型 ChatGPT4o 、 Claude 3.5 Sonnect 、 Gemini Advanced 具有最好的綜合表現與 Coding 能力。 - [Yi](https://github.com/01-ai/Yi/blob/main/README_CN.md) 是唯一同時在綜合表現與Coding能力前十榜單中出現的開源模型。(from 大陸) - 開源模型[DeepSeek](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct#5-how-to-run-locally) 雖然在綜合表現不佳(Rank 21)，但在 Coding 方面卻有排名第十的好表現。(from 大陸) #### MMLU [評分機制](https://arxiv.org/pdf/2009.03300)：本評分機制總共有57個不同領域的任務，共計14079個問題。對應的分數為各任務答對率的加權平均數。 - 以下MMLU總排名前十名的AI模型： >GPT-3 X-Large (few-shot) 在2021年的得分為 43.9 ![螢幕擷取畫面 2024-07-12 160924](https://hackmd.io/_uploads/Sy71rwAvA.png) - 需付費的模型 ChatGPT4o 、 Claude 3.5 Sonnect 、 Gemini1.5 Pro 具有最好的 MMLU 分數。 - 在開源模型中，[Qwen2-72B](https://qwenlm.github.io/blog/qwen2/)、[Reak](https://www.reka.ai/news/reka-core-our-frontier-class-multimodal-language-model)、[Llama3](https://github.com/meta-llama/llama3)、[Mistral-Large](https://mistral.ai/news/mistral-large/) 這四個模型有進入排行榜前十。 ### 綜合推薦前三名根據 arena scroce 與 MMLU 的分數表現，我們認為以下三款須付費訂閱的生成式AI，在目前所有生成式AI中擁有最強的性能，能有效地在程式撰寫方面為程式設計師提供幫助。 1. [ChatGPT-4o](https://chatgpt.com/) (如果訂閱團隊版，官網說上傳的data不會變成他們的training data) 2. [Claude 3.5 Sonnet](https://claude.ai/new) 3. [Gemini-Advanced](https://gemini.google.com/app) ### 開源AI #### 開源模型推薦如果不想要使用需付費的第三方閉源軟體大型語言模型(LLM)的話，也可以考慮使用以下幾個綜合表現較佳的開源生成式AI - arena score 表現較佳： 1. [Yi](https://github.com/01-ai/Yi/blob/main/README_CN.md) (from 大陸) 2. [DeepSeek](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct#5-how-to-run-locally) (from 大陸) - MMLU 表現較佳 1. [meta llama3](https://github.com/meta-llama/llama3) 2. [Qwen2-72B](https://qwenlm.github.io/blog/qwen2/) 3. [Mistral-Large](https://mistral.ai/news/mistral-large/) 4. [Reak](https://www.reka.ai/news/reka-core-our-frontier-class-multimodal-language-model) #### 使用開源模型的好處 - 免費 - 可以用自己的資料進行模型微調(fine tuning) - 可以將開源模型下載下來在本地運行，因而不用擔心資料外洩 #### 使用開源模型的壞處 - 目前依舊沒有一款開源模型性能比三大付費生成式AI好 - 若想在本地運行開源模型，需要有性能較高的電腦硬體配置 ### 價格比較 >因為開源模型不用付錢，所以這裡將以付費模型做為討論對象 1. [ChatGPT-4o](https://chatgpt.com/#pricing) - 個人版：USD $20/月 - 企業版：USD $25 1人/月(最少須兩人，且費用需年繳) 2. [Claude 3.5 Sonnet](https://claude.ai/upgrade) - 個人版：USD $20/月 - 團隊版：USD $30/月(最低需五人) 3. [Gemini Advanced](https://one.google.com/explore-plan/gemini-advanced?utm_source=gemini&utm_medium=web&utm_campaign=faq&_gl=1*9yub6c*_ga*MjYwODYwNDg0LjE3MjEwMDczNDk.*_ga_WC57KJ50ZZ*MTcyMTAxMzkzOS4yLjAuMTcyMTAxMzkzOS4wLjAuMA..&g1_landing_page=65) - 個人版：NTD 650/月 ### 範例實測在實際用上述的AI工具寫 leetcode 上的題目，發現在medium等級、資料結構內容的解題上 (例如[2196. create binary tree from descriptions](https://leetcode.com/problems/create-binary-tree-from-descriptions/description/?envType=daily-question&envId=2024-07-15))， AI 可以成功用Python跟C語言解題。不過同樣是 medium 等級、但在較為注重演算法和效率的題目上 (例如[1248. Count Number of Nice Subarrays](https://leetcode.com/problems/count-number-of-nice-subarrays/description/))，AI 則可能無法馬上給出正確的答案，或者即便在幾次微調跟請它除錯後能寫出正確答案，但在效率上仍會 Time Limit Exceed(TLE). 以下是ChatGPT對leetcode上第 [1248. Count Number of Nice Subarrays](https://leetcode.com/problems/count-number-of-nice-subarrays/description/) 這道題目的回答: > To solve this problem, you need to build the binary tree from the given descriptions. Here's a step-by-step algorithm to achieve this: > > Create a dictionary of nodes: Use a dictionary to store nodes by their values. This will allow us to easily access any node by its value. > > Track children nodes: Use a set to keep track of all child nodes. This will help us determine the root node later. > > Build the tree: Iterate through the descriptions and create or update the parent and child nodes accordingly. Also, update the child relationship based on the isLefti value. > > Identify the root node: The root node is the only node that is not a child of any other node. We can find this by subtracting the set of child nodes from the set of all nodes. > > Return the root node: Finally, return the root node, which represents the entire tree. > > Here is the algorithm in Python: ```python class TreeNode(object): def __init__(self, val=0, left=None, right=None): self.val = val self.left = left self.right = right class Solution(object): def createBinaryTree(self, descriptions): """ :type descriptions: List[List[int]] :rtype: Optional[TreeNode] """ from collections import defaultdict nodes = {} children = set() for parent, child, isLeft in descriptions: if parent not in nodes: nodes[parent] = TreeNode(parent) if child not in nodes: nodes[child] = TreeNode(child) if isLeft: nodes[parent].left = nodes[child] else: nodes[parent].right = nodes[child] children.add(child) root_val = (set(nodes.keys()) - children).pop() return nodes[root_val] # Example usage: descriptions = [[20, 15, 1], [20, 17, 0], [50, 20, 1], [50, 80, 0], [80, 19, 1]] solution = Solution() root = solution.createBinaryTree(descriptions) ``` 可以發現到AI能夠提供詳細解題流程的講解跟正確的程式碼。 ## Extensions ### [Github Copilot](https://copilot.microsoft.com/) 每月10美金，每年100美金。是這邊列出的各項產品中，最多人使用的extension. GitHub Copilot 透過機器學習技術，分析開發人員之前編寫的程式碼，學習他們的程式碼風格和慣例。這樣可以幫助開發人員保持一致的程式碼風格，提高團隊協作的效率。開發人員也可以根據自己的喜好，微調 GitHub Copilot 生成的程式碼。 ### [Codeium](https://codeium.com/) 免費使用，支援的IDE比Copilot多，包含像是Jupyter, Colab, Chrome Extension等都支援，而這些在Copilot是不支援的。得到函式後上方有小提示可以選擇重構或解釋代碼或生成JSDoc，選單常用的功能有自動生成註解、檢查程式bug、簡化程式碼...等。公司內網無法使用，顯示以下錯誤訊息。 ![螢幕擷取畫面 2024-07-15 165610](https://hackmd.io/_uploads/S1ypiSm_A.png) ### [Codium](https://www.codium.ai/) 免費使用，寫出來的程式，相較於ChatGPT的時間複雜度較高。強大的地方在於可以快速的產生單元測試以及解讀整篇程式碼，針對整份文件解讀並整理出每段程式碼的功能和用途。另外也有程式碼優化建議選項，點選想測試的程式碼後點選Code Suggestions即可。 ### [Double](https://docs.double.bot/introduction) - 跟Github Copilot同款的 AI，可以做為插鍵裝進 VSCode 中。官方號稱比Copilot強大，但實際效果未知。 ### [DeepCode](https://github.com/DeepCodeAI/vscode-extension?tab=readme-ov-file) - DeepCode公司目前已被Snyk收購，DeepCode API 已於2021年停止更新，目前其功能由 Snyk Security 所取代。 - [Snyk Security](https://marketplace.visualstudio.com/items?itemName=snyk-security.snyk-vulnerability-scanner) 為 Github Copilot 的同質性 VSCode 插鍵，並增加了安全性自動掃描的功能，但實際效果未知。