Bleu, 任家輝
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.

      Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Explore these features while you wait
      Complete general settings
      Bookmark and like published notes
      Write a few more notes
      Complete general settings
      Write a few more notes
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.

    Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Explore these features while you wait
    Complete general settings
    Bookmark and like published notes
    Write a few more notes
    Complete general settings
    Write a few more notes
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # 2025 AI模型API比較 比較指標將涵蓋: - **API 速度**:請求延遲、吞吐量(Requests Per Second, RPS) - **功能性**:多模態能力(圖像、音頻、視頻)、程式碼生成能力、長文本處理 - **反應時間**:平均響應時間、最大/最小延遲 - **價格**:計費方式(按請求、按 Token)、各模型的定價細節 - **智能程度**:推理能力、創造力、知識覆蓋範圍 **模型** | **API速度**(請求延遲、吞吐量 RPS) | **功能性**(多模態能力、代碼生成、長文本處理) | **反應時間**(平均響應時間、最大/最小延遲) | **價格**(計費方式、模型定價) | **智能程度**(推理能力、創造力、知識覆蓋範圍) ---|---|---|---|---|--- **OpenAI O3-mini (High)** | 延遲較前代降低,支持高並發。可根據“推理強度”模式調整思考深度:低模式下首字輸出約0.5秒,吞吐~175 tokens/s ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=increases%20inference%20costs%20and%20latency,latency%20to%20the%20first%20token));高模式會增加額外鏈式思考步驟,延遲可達 ~15 秒 ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=increases%20inference%20costs%20and%20latency,latency%20to%20the%20first%20token))。 | 純文本模型,不支持圖像輸入 ([OpenAI o3-mini | OpenAI](https://openai.com/index/openai-o3-mini/#:~:text=o3%E2%80%91mini%20does%20not%20support%20vision,opens%20in%20a%20new%20window))。擅長代碼生成和數學推理,提供函數調用、結構化輸出等開發者功能 ([OpenAI o3-mini | OpenAI](https://openai.com/index/openai-o3-mini/#:~:text=OpenAI%20o3%E2%80%91mini%20is%20our%20first,opens%20in%20a%20new))。上下文窗口支持約20萬-token輸入+10萬輸出 ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=reasoning%20models%20DeepSeek,knowledge%20cutoff%20is%20October%202023))。 | 默認中等模式響應較快,相比O1模型提速明顯;在高精度模式下需額外思考,響應變慢(延遲可達十余秒) ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=increases%20inference%20costs%20and%20latency,latency%20to%20the%20first%20token))。整體能夠在速度和準確率間靈活權衡。 | 按使用計費:輸入約 $1.10/百萬 tokens,輸出約 $4.40/百萬 tokens ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=,API%20calls%20users%20can%20make))。相比OpenAI O1成本降低約93% ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=,API%20calls%20users%20can%20make))。ChatGPT Plus等付費用戶可使用高模式,無額外加價。 | STEM領域推理極為出色,高模式在數學、編程基準上稍優於OpenAI O1 ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=%2A%20In%20OpenAI%E2%80%99s%20tests%2C%20o3,7%20percent%20and%2039%20percent))。但是在常識問答上,即使高模式下表現(13.8%)仍低於O1 (47%)和GPT-4o (39%) ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=medium%20effort%2C%20and%20it%20outperformed,7%20percent%20and%2039%20percent))。總體邏輯推理強,創造性和知識面比肩頂級小模型。 **OpenAI O1** | OpenAI首個鏈式思考模型,請求需額外推理步驟,導致初始延遲較高:首字往往需等待約10秒 ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=runs%20faster%20%28168,latency%20to%20the%20first%20token))。生成過程中吞吐量可達約200 tokens/s ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=runs%20faster%20%28168,latency%20to%20the%20first%20token))。API並發速率有限制(高端付費用戶配額較高)。 | 支持多模態輸入,具備視覺推理能力,可接受圖像等作為上下文 ([OpenAI o3-mini | OpenAI](https://openai.com/index/openai-o3-mini/#:~:text=o3%E2%80%91mini%20does%20not%20support%20vision,opens%20in%20a%20new%20window))。同樣支持函數調用和系統消息等接口特性。上下文長度與GPT-4o相當(約128K)。擅長代碼、數學等覆雜任務。 | 由於引入“思考”Token,平均響應較慢,需數秒到十余秒才能得到完整答覆。無快速模式可選,面向覆雜問題會主動內部推理多步,延遲明顯。 | API價格昂貴:輸入 ~$15/百萬 tokens,輸出 ~$60/百萬 tokens ([Claude 3.7 Sonnet vs OpenAI o1 vs DeepSeek R1](https://www.vellum.ai/blog/claude-3-7-sonnet-vs-openai-o1-vs-deepseek-r1#:~:text=surcharge%20for%20reasoning))。是OpenAI系列中最高等級定價,適用於高精度需求。 | 高級推理和廣泛知識覆蓋能力突出。在學術問答等基準上成績極佳(如MMLU達92% ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=medium%20effort%2C%20and%20it%20outperformed,7%20percent%20and%2039%20percent))),常識問答顯著優於新模型。數學推理、代碼理解均屬頂尖水準 ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=%2A%20In%20OpenAI%E2%80%99s%20tests%2C%20o3,7%20percent%20and%2039%20percent))。創造力強且更少幻覺,是當時綜合表現最強的模型之一。 **DeepSeek R1** | 采用Mixture-of-Experts架構的大模型。雖然參數總量6710億,但每次推理僅激活約370億參數 ([o3-mini vs DeepSeek-R1 - Detailed Performance & Feature Comparison](https://docsbot.ai/models/compare/o3-mini/deepseek-r1#:~:text=DeepSeek)),推理過程並行化,一定程度提升吞吐。未公開具體延遲數據,但引入鏈式推理,同樣有思考開銷。網站/移動端聊天中支持文件上傳,實際模型本身不直接處理非文本 ([Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search | VentureBeat](https://venturebeat.com/ai/google-launches-gemini-2-0-pro-flash-lite-and-connects-reasoning-model-flash-thinking-to-youtube-maps-and-search/#:~:text=Neither%20DeepSeek,and%20file%20uploads%20or%20attachments))。 | 純文本LLM,不支持原生圖像等多模態輸入 ([Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search | VentureBeat](https://venturebeat.com/ai/google-launches-gemini-2-0-pro-flash-lite-and-connects-reasoning-model-flash-thinking-to-youtube-maps-and-search/#:~:text=Neither%20DeepSeek,and%20file%20uploads%20or%20attachments))。強化了推理鏈能力,兩階段RL訓練挖掘推理模式、兩階段SFT保證一般任務性能 ([o3-mini vs DeepSeek-R1 - Detailed Performance & Feature Comparison](https://docsbot.ai/models/compare/o3-mini/deepseek-r1#:~:text=DeepSeek))。上下文窗口128K輸入+32K輸出 ([o3-mini vs DeepSeek-R1 - Detailed Performance & Feature Comparison](https://docsbot.ai/models/compare/o3-mini/deepseek-r1#:~:text=Input%20Context%20Window)) ([o3-mini vs DeepSeek-R1 - Detailed Performance & Feature Comparison](https://docsbot.ai/models/compare/o3-mini/deepseek-r1#:~:text=Maximum%20Output%20Tokens))。擅長代碼、生成人類可讀的思考過程。模型開源,可通過自有API或HF運行 ([o3-mini vs DeepSeek-R1 - Detailed Performance & Feature Comparison](https://docsbot.ai/models/compare/o3-mini/deepseek-r1#:~:text=API%20Providers))。 | 默認會進行鏈式思考,響應速度略慢於一般對話模型。首字延遲官方未明確,但在其UI上可邊思考邊輸出。對覆雜問題可能也需數秒思考,總體交互體驗與OpenAI O1相近。 | 定價極為低廉:輸入約 $0.55/百萬,輸出約 $2.19/百萬 tokens ([o3-mini vs DeepSeek-R1 - Detailed Performance & Feature Comparison](https://docsbot.ai/models/compare/o3-mini/deepseek-r1#:~:text=Output))(比O3-mini便宜近2倍 ([o3-mini vs DeepSeek-R1 - Detailed Performance & Feature Comparison](https://docsbot.ai/models/compare/o3-mini/deepseek-r1#:~:text=per%20million%20tokens)))。另有自部署開源權,可自行托管節省成本。 | 推理能力被認為可與OpenAI O1相媲美 ([o3-mini vs DeepSeek-R1 - Detailed Performance & Feature Comparison](https://docsbot.ai/models/compare/o3-mini/deepseek-r1#:~:text=discovering%20improved%20reasoning%20patterns%20and,math%2C%20code%2C%20and%20reasoning%20tasks))。在數學、代碼和覆雜推理任務上表現接近O1 ([o3-mini vs DeepSeek-R1 - Detailed Performance & Feature Comparison](https://docsbot.ai/models/compare/o3-mini/deepseek-r1#:~:text=discovering%20improved%20reasoning%20patterns%20and,math%2C%20code%2C%20and%20reasoning%20tasks)),用戶競技場偏好顯示其與O1、不少頂級閉源模型不相上下 ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=applications,least%20for%20common%2C%20everyday%20prompts)) ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=We%E2%80%99re%20thinking%3A%C2%A0Regardless%20of%20benchmark%20performance%2C,least%20for%20common%2C%20everyday%20prompts))。創意和知識覆蓋也屬於一流水準,但可能由於對真知嚴格,對常識問答稍遜於側重知識的大模型。 **Gemini 2.0 Pro (實驗版)** | 屬於Google最先進的大模型系列,因處於實驗階段,API調用受到嚴格速率限制 ([Gemini Pro 2.0 Experimental (free) - API, Providers, Stats](https://openrouter.ai/google/gemini-2.0-pro-exp-02-05:free#:~:text=Gemini%20Pro%202,limited%20by))。推理過程深度更高,延遲相對基礎版有所增加。適合批量長文本分析但不強調低延遲。 | 多模態AI,具備讀取文本、代碼,**可能**支持圖像等視覺信息(Gemini系列整體強調多模態)。與搜索和代碼執行環境集成,可聯網查詢和運行代碼 ([Google Gemini 2.0 Pro Experimental vs OpenAI o3-mini](https://www.analyticsvidhya.com/blog/2025/02/gemini-2-0-pro-experimental-vs-o3-mini/#:~:text=Google%E2%80%99s%20Gemini%202,date%20information))。上下文窗口最高達200萬tokens,能處理海量信息 ([Gemini app adding 2.0 Pro and 2.0 Flash Thinking Experimental](https://9to5google.com/2025/02/05/gemini-2-0-pro-flash-thinking-experimental-app/#:~:text=For%20the%20Gemini%20API%2C%20there%E2%80%99s,get%201%20million%20like%20before))。在覆雜編程任務和世界知識推理上效果最佳 ([Gemini app adding 2.0 Pro and 2.0 Flash Thinking Experimental](https://9to5google.com/2025/02/05/gemini-2-0-pro-flash-thinking-experimental-app/#:~:text=Google%20says%202,%E2%80%9D))。 | 由於面向覆雜任務,會進行更深入的鏈式推理,平均響應時間比即時對話模型更長。對於要求嚴謹推理的問題,可能需十幾秒甚至更久才能完整回答。簡單提問則可在可接受時間內返回結果。 | 尚處於有限預覽階段,暫未公開單獨API計費。Gemini Advanced訂閱用戶(月費$19.99)可搶先體驗 ([Gemini app adding 2.0 Pro and 2.0 Flash Thinking Experimental](https://9to5google.com/2025/02/05/gemini-2-0-pro-flash-thinking-experimental-app/#:~:text=Gemini%20Advanced%20subscribers%20%28%2419,AI%20Studio%20%2B%20Vertex%20AI));開發者通過AI Studio/Vertex AI試用,目前使用不額外收費但有配額。 | 公認為Google迄今智能水平最高的模型。編碼能力卓越 ([Gemini app adding 2.0 Pro and 2.0 Flash Thinking Experimental](https://9to5google.com/2025/02/05/gemini-2-0-pro-flash-thinking-experimental-app/#:~:text=Google%20says%202,%E2%80%9D))、對覆雜長文本和知識推理的理解力領先,以往模型無法解答的難題它往往有更完善的解決方案。推理嚴謹性和知識覆蓋面刷新了Google模型紀錄,同時在創造性生成上也達到了頂尖水準。 **Claude 3.7 Sonnet** | Anthropic新一代模型,采用“雙模式”架構:標準模式下極為迅速,官方稱典型對話延遲僅 ~200ms ([Claude 3.7 Sonnet vs OpenAI o1 vs DeepSeek R1](https://www.vellum.ai/blog/claude-3-7-sonnet-vs-openai-o1-vs-deepseek-r1#:~:text=Claude%203,16s))(可能指首字或短答覆)。實際測試平均響應約1.16秒 ([Claude 3.7 Sonnet vs OpenAI o1 vs DeepSeek R1](https://www.vellum.ai/blog/claude-3-7-sonnet-vs-openai-o1-vs-deepseek-r1#:~:text=cites%20,16s))。擴展思考模式下會犧牲速度來換取更強推理,可內部思考長達15秒甚至更多 ([Claude 3.7 Sonnet vs OpenAI o1 vs DeepSeek R1](https://www.vellum.ai/blog/claude-3-7-sonnet-vs-openai-o1-vs-deepseek-r1#:~:text=Image))。開發者可動態權衡延遲和準確性。 | 純文本大型模型,暫無圖像等多模態功能。支持最大100K級別以上長上下文(可配置“思考”Token上限至128K,用於內部推理) ([Claude 3.7 Sonnet vs OpenAI o1 vs DeepSeek R1](https://www.vellum.ai/blog/claude-3-7-sonnet-vs-openai-o1-vs-deepseek-r1#:~:text=For%20anyone%20looking%20to%20test,creates%20entirely%20new%20optimization%20opportunities))。擅長代碼、數學、多語言問答,在遵循指令和內容安全方面也有優化。無需分離快模型和聰明模型,一個Claude同時滿足快速與深度需求。 | 普通對話場景下響應極快,用戶幾乎即時得到答案 ([Claude 3.7 Sonnet vs OpenAI o1 vs DeepSeek R1](https://www.vellum.ai/blog/claude-3-7-sonnet-vs-openai-o1-vs-deepseek-r1#:~:text=Claude%203,16s))。啟用深度推理時,模型可能先“沈思”,最長十多秒給出更詳盡的解答 ([Claude 3.7 Sonnet vs OpenAI o1 vs DeepSeek R1](https://www.vellum.ai/blog/claude-3-7-sonnet-vs-openai-o1-vs-deepseek-r1#:~:text=Image))。開發者可通過限制思考Token數量控制最大延遲,保證在預期時間內返回結果。 | 價格與前代Claude 2一致:輸入 ~$3/百萬,輸出 ~$15/百萬 tokens ([Claude 3.7 Sonnet vs OpenAI o1 vs DeepSeek R1](https://www.vellum.ai/blog/claude-3-7-sonnet-vs-openai-o1-vs-deepseek-r1#:~:text=Claude%203,No%20extra%20surcharge%20for%20reasoning))。思考用掉的推理token也計入輸出字數,但Anthropic不對此另收費用 ([Claude 3.7 Sonnet vs OpenAI o1 vs DeepSeek R1](https://www.vellum.ai/blog/claude-3-7-sonnet-vs-openai-o1-vs-deepseek-r1#:~:text=Claude%203,No%20extra%20surcharge%20for%20reasoning))。相較OpenAI O1(15/60)便宜許多,但比O3-mini略高 ([Claude 3.7 Sonnet vs OpenAI o1 vs DeepSeek R1](https://www.vellum.ai/blog/claude-3-7-sonnet-vs-openai-o1-vs-deepseek-r1#:~:text=surcharge%20for%20reasoning))。 | 智能水平全面而均衡。覆雜推理能力卓越,在研究生難度的問答中表現拔尖(如GPQA鉆石78.2%) ([Claude 3.7 Sonnet vs OpenAI o1 vs DeepSeek R1](https://www.vellum.ai/blog/claude-3-7-sonnet-vs-openai-o1-vs-deepseek-r1#:~:text=Benchmarks));多領域知識測驗MMLU達86.1% ([Claude 3.7 Sonnet vs OpenAI o1 vs DeepSeek R1](https://www.vellum.ai/blog/claude-3-7-sonnet-vs-openai-o1-vs-deepseek-r1#:~:text=Image))。在代碼生成、翻譯等任務上也有強勁表現。創造力和上下文理解良好,同時通過可控“思考”顯著減少幻覺和錯誤,是目前綜合能力極強的模型之一。 **Gemini 2.0 Flash** | Google面向實際產品的高效模型,強調低延遲高吞吐。推理非常快速,實測首字延遲僅約0.46秒 ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=increases%20inference%20costs%20and%20latency,latency%20to%20the%20first%20token));生成速率約168.8 tokens/s ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=increases%20inference%20costs%20and%20latency,latency%20to%20the%20first%20token)),在同級模型中名列前茅。支持大規模並發調用,適合實時應用場景。 | 多模態支持強,可處理文本、圖像等多種輸入 ([Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search | VentureBeat](https://venturebeat.com/ai/google-launches-gemini-2-0-pro-flash-lite-and-connects-reasoning-model-flash-thinking-to-youtube-maps-and-search/#:~:text=Neither%20DeepSeek,and%20file%20uploads%20or%20attachments))(Gemini Flash支持用戶上傳圖片、文件作為上下文)。具備強大的廣域推理和工具使用能力,可連接谷歌地圖、YouTube、搜索等應用 ([Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search | VentureBeat](https://venturebeat.com/ai/google-launches-gemini-2-0-pro-flash-lite-and-connects-reasoning-model-flash-thinking-to-youtube-maps-and-search/#:~:text=But%20there%20was%20some%20news,services%20like%20DeepSeek%20and%20OpenAI))。上下文窗口達100萬tokens ([Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search | VentureBeat](https://venturebeat.com/ai/google-launches-gemini-2-0-pro-flash-lite-and-connects-reasoning-model-flash-thinking-to-youtube-maps-and-search/#:~:text=programming%20interface%20%28API%29)),可應對超長文檔摘要與覆雜推理。 | 平均響應時間極短,對多數查詢均能做到即時回答 ([Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search | VentureBeat](https://venturebeat.com/ai/google-launches-gemini-2-0-pro-flash-lite-and-connects-reasoning-model-flash-thinking-to-youtube-maps-and-search/#:~:text=in%20December%2C%20is%20now%20production))。由於針對低延遲優化,幾乎感覺不到等待。同時在UI中可以實時顯示其思考步驟(Flash Thinking模式),交互流暢。 | 按百萬tokens計費極其親民。Gemini 2.0 Flash-Lite版本僅 ~$0.075/百萬輸入、$0.30/百萬輸出 ([Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search | VentureBeat](https://venturebeat.com/ai/google-launches-gemini-2-0-pro-flash-lite-and-connects-reasoning-model-flash-thinking-to-youtube-maps-and-search/#:~:text=As%20shown%20in%20the%20table,maintaining%20the%20same%20cost%20structure));Flash完整版定價略高(推測約雙倍,即$0.15/$0.60)但仍遠低於同等性能競品。定價策略令其在成本效益上優勢明顯。 | 整體智能水平僅次於Pro模型,遠超上一代。第三方對話競技場上用戶偏好度很高,甚至常勝於OpenAI O1和DeepSeek R1 ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=applications,least%20for%20common%2C%20everyday%20prompts)) ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=We%E2%80%99re%20thinking%3A%C2%A0Regardless%20of%20benchmark%20performance%2C,least%20for%20common%2C%20everyday%20prompts))。在專業知識問答、代碼和常規對話上表現優秀,生成結果準確且連貫。盡管在極端覆雜任務上略遜Pro版,但已是當前性能最強、用途最廣的通用模型之一。 **OpenAI GPT-4o (2024年11月)** | OpenAI的多模態旗艦模型,“o”代表“omni”。支持更優化的推理並引入了成本優化,相比早期GPT-4略提升速度和並發。提供128K上下文窗口 ([Benchmarking Amazon Nova and GPT-4o models with FloTorch | AWS Machine Learning Blog](https://aws.amazon.com/blogs/machine-learning/benchmarking-amazon-nova-and-gpt-4o-models-with-flotorch/#:~:text=FloTorch%20used%20the%20GPT,The%20inference))(輸入),16K輸出上限 ([Benchmarking Amazon Nova and GPT-4o models with FloTorch | AWS Machine Learning Blog](https://aws.amazon.com/blogs/machine-learning/benchmarking-amazon-nova-and-gpt-4o-models-with-flotorch/#:~:text=FloTorch%20used%20the%20GPT,API%20calls%20using%20the%20same))。由於模型規模龐大,單請求延遲相對較高,但Azure等平台支持批處理和更高並發配額 ([Benchmarking Amazon Nova and GPT-4o models with FloTorch | AWS Machine Learning Blog](https://aws.amazon.com/blogs/machine-learning/benchmarking-amazon-nova-and-gpt-4o-models-with-flotorch/#:~:text=Both%20models%20were%20evaluated%20by,function%20code%20is%20as%20follows))。 | 真·多模態:可處理文本、圖像和音頻輸入,生成文本(及音頻輸出) ([GPT-4o - Wikipedia](https://en.wikipedia.org/wiki/GPT-4o#:~:text=GPT,3%20%5D%20Its%20application)) ([What Is GPT-4o? - IBM](https://www.ibm.com/think/topics/gpt-4o#:~:text=What%20Is%20GPT,audio%2C%20image%20and%20video%20input))。可進行實時對話、問答、代碼寫作等多種任務 ([GPT-4o explained: Everything you need to know - TechTarget](https://www.techtarget.com/whatis/feature/GPT-4o-explained-Everything-you-need-to-know#:~:text=OpenAI%27s%20GPT,Q%26A%2C%20text%20generation%20and%20more))。在理解覆雜圖像、語音內容方面業界領先,也是OpenAI首個真正融合視覺和聽覺能力的模型。 | 作為高精度大模型,響應時間明顯偏長。覆雜問題往往需數秒甚至數十秒才能得到完整解答。即使有優化,“GPT-4o”在速度上仍不及小型模型,但能通過流式API漸進輸出緩解等待。 | 定價昂貴:約 $18/百萬輸入、$72/百萬輸出(比Amazon Nova Pro貴22.5倍) ([Amazon Nova Pro vs GPT-4 - DocsBot AI](https://docsbot.ai/models/compare/amazon-nova-pro/gpt-4#:~:text=Amazon%20Nova%20Pro%20vs%20GPT,for%20input%20and%20output%20tokens))。這是針對API調用的商業價格,ChatGPT使用中免費用戶也可有限體驗(Plus用戶享更高額度 ([GPT-4o - Wikipedia](https://en.wikipedia.org/wiki/GPT-4o#:~:text=GPT,3%20%5D%20Its%20application)))。高成本反映了其超強能力和多模態特性。 | 多領域智能的巔峰之作。MMLU知識測試約88.7%正確率 ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=medium%20effort%2C%20and%20it%20outperformed,7%20percent%20and%2039%20percent))(與O1相當),常識問答(SimpleQA)約39%優於大多數模型僅次於O1 ([o3-mini Puts Reasoning in High Gear, How to Train for Computer Use, Gemini 2.0 Thinks Faster, and more...](https://www.deeplearning.ai/the-batch/issue-287/#:~:text=medium%20effort%2C%20and%20it%20outperformed,7%20percent%20and%2039%20percent))。在覆雜推理、創造性寫作、代碼等幾乎所有任務上都表現卓越,被視為業界基準。在圖像理解、跨模態推理方面更是一騎絕塵。 **Meta Llama 3.3 70B** | Meta最新開源模型,采用70億參數*10倍=70B參數的高效Transformer架構 ([What Is Meta's Llama 3.3 70B? How It Works, Use Cases & More | DataCamp](https://www.datacamp.com/blog/llama-3-3-70b#:~:text=Architecture%3A%20Efficient%20and%20scalable))。引入**Grouped-Query Attention (GQA)**機制顯著提升推理效率 ([What Is Meta's Llama 3.3 70B? How It Works, Use Cases & More | DataCamp](https://www.datacamp.com/blog/llama-3-3-70b#:~:text=What%E2%80%99s%20different%20about%20Llama%203,far%20less%20demanding%20on%20hardware)),使其雖然規模龐大但推理速度接近更大模型的水平。支持在本地設備上運行,針對普通GPU/工作站優化部署 ([What Is Meta's Llama 3.3 70B? How It Works, Use Cases & More | DataCamp](https://www.datacamp.com/blog/llama-3-3-70b#:~:text=Designed%20for%20accessible%20hardware))。相較4050億參數的Llama 3.1,3.3版在大幅降低算力需求下實現了可比性能 ([What Is Meta's Llama 3.3 70B? How It Works, Use Cases & More | DataCamp](https://www.datacamp.com/blog/llama-3-3-70b#:~:text=What%E2%80%99s%20different%20about%20Llama%203,far%20less%20demanding%20on%20hardware))。 | 純文本指令微調模型,僅提供Instruction版本 ([Model Cards and Prompt formats - Llama 3.3](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_3/#:~:text=Model%20Cards%20and%20Prompt%20formats,comprehensive%20technical%20information%20about))(無純預訓練模型)。不具備圖像或音頻輸入能力 ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=Something%20missing%20from%20Mistral%20Large,increasingly%20looking%20to%20build%20with))。訓練於15萬億Token海量語料 ([What Is Meta's Llama 3.3 70B? How It Works, Use Cases & More | DataCamp](https://www.datacamp.com/blog/llama-3-3-70b#:~:text=Training%20and%20fine))並經SFT和RLHF調教 ([What Is Meta's Llama 3.3 70B? How It Works, Use Cases & More | DataCamp](https://www.datacamp.com/blog/llama-3-3-70b#:~:text=Training%20a%20model%20like%20Llama,understanding%20of%20language%20and%20knowledge)) ([What Is Meta's Llama 3.3 70B? How It Works, Use Cases & More | DataCamp](https://www.datacamp.com/blog/llama-3-3-70b#:~:text=%2A%20Supervised%20fine,feedback%20to%20refine%20its%20behavior)),在安全性和有益性上較前代提升。上下文長度未明確公布(據推測支持長上下文,社區評測LOFT任務得分83.1% ([Grok 3 Review: A Critical Look at xAI's 'Smartest AI' Claim. - Medium](https://medium.com/@bernardloki/grok-3-review-a-critical-look-at-xais-smartest-ai-claim-aea15ca38b66#:~:text=Medium%20medium,This%20suggests%20xAI)))。擅長內容創作、聊天、代碼和科研問答等多種應用 ([What Is Meta's Llama 3.3 70B? How It Works, Use Cases & More](https://www.datacamp.com/blog/llama-3-3-70b#:~:text=What%20Is%20Meta%27s%20Llama%203,1))。 | 雖為大模型但設計上注重推理效率,實際響應速度在同規模模型中表現優秀。由於支持本地量化部署,開發者可在單機上較流暢地運行(有報告量化後CPU也能生成輸出)。相較需要集群的超大模型,在中短文本交互中延遲更低、輸出更快 ([What Is Meta's Llama 3.3 70B? How It Works, Use Cases & More | DataCamp](https://www.datacamp.com/blog/llama-3-3-70b#:~:text=What%E2%80%99s%20different%20about%20Llama%203,far%20less%20demanding%20on%20hardware)) ([What Is Meta's Llama 3.3 70B? How It Works, Use Cases & More | DataCamp](https://www.datacamp.com/blog/llama-3-3-70b#:~:text=Designed%20for%20accessible%20hardware))。 | 模型以開源研究許可證發布,可自由非商用使用 ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=However%2C%20it%E2%80%99s%20important%20to%20note,405%20billion%20parameters%2C%20of%20course))。自行托管無需API費用。在AWS Bedrock等雲服務上亦可獲得(推理費用比封閉模型低很多)。企業商用需向Meta付費許可。總體而言,使用成本遠低於同等性能的閉源模型,是高性能開源替代方案。 | 性能媲美體量數倍於己的模型 ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=cost%20for%20open%20models%2C%20backing,with%20a%20handful%20of%20benchmarks))。在代碼生成、邏輯推理、數學和常識問答上,Llama 3.3 70B甚至超越Meta自家的405B模型3.1 ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=cost%20for%20open%20models%2C%20backing,with%20a%20handful%20of%20benchmarks)) ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=Large%202%20appears%20to%20outpace,123%20billion%2C%20to%20be%20precise))。同時針對幻覺問題做了特別訓練,更善於在未知時回答“不確定”而非亂猜 ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=be%20precise))。它在各項基準上逼近封閉頂級模型的水平,成為“縮放定律未死”的有力證據 ([llama3.3:70b - Ollama](https://ollama.com/library/llama3.3:70b#:~:text=llama3.3%3A70b%20,43GB%20%3B%20params%20%C2%B7%2096B))。創造力和多語種能力優秀,是當前最強的開源Instruction LLM之一。 **Mistral Large 2 (2024年11月)** | Mistral AI的下一代旗艦LLM,參數規模約1230億 ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=cost%20for%20open%20models%2C%20backing,with%20a%20handful%20of%20benchmarks)) ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=Large%202%20appears%20to%20outpace,123%20billion%2C%20to%20be%20precise))。盡管體積龐大,但針對推理效率做了優化,官方將其暴露在自家平台“La Plateforme”上供高速調用 ([Large Enough | Mistral AI](https://mistral.ai/news/mistral-large-2407#:~:text=Mistral%20Large%202%20is%20exposed,Mistral%20Large%202))。支持超長上下文:輸入窗口達128k tokens ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=and%20text%20simultaneously%2C%20a%20feature,increasingly%20looking%20to%20build%20with))(約300頁文本)。不支持圖像等多模態輸入,與Meta同期發布的Llama 3.1一樣主要面向文本任務 ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=Something%20missing%20from%20Mistral%20Large,increasingly%20looking%20to%20build%20with))。 | 純文本大模型,無多模態功能 ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=Something%20missing%20from%20Mistral%20Large,increasingly%20looking%20to%20build%20with))。以對標OpenAI/Meta最新模型為目標訓練,在代碼生成、數學推理等方面表現與最頂尖模型看齊 ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=Mistral%20released%20a%20fresh%20new,code%20generation%2C%20mathematics%2C%20and%20reasoning)) ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=terms%20of%20code%20generation%2C%20mathematics%2C,and%20reasoning))。通過特殊訓練減少幻覺傾向,當不了解問題時更可能坦誠回應不確定 ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=be%20precise))。多語言能力較前版增強,適用於企業級長文分析和覆雜問答。 | 推理需要強大算力支撐,但Mistral官方提供了雲端推理服務(如Snowflake Cortex等 ([New Mistral Large 2 model available in Snowflake Cortex AI](https://docs.snowflake.com/en/release-notes/2024/other/2024-08-29-mistral-large2#:~:text=New%20Mistral%20Large%202%20model,inference%20in%20Snowflake%20Cortex%20AI))),以優化後的實現降低延遲。大多數查詢可在數秒內得到回答,長文摘要等場景因上下文龐大可能稍慢。相對於規模更大的封閉模型,該模型在性能/延遲上的性價比非常突出。 | 商業API采用分層計費模式,最近宣布降價:輸入約 $2.00/百萬,輸出 ~$6.00/百萬 tokens ([Mistral Large 2 (Nov '24) - Intelligence, Performance & Price Analysis](https://artificialanalysis.ai/models/mistral-large-2#:~:text=Analysis%20artificialanalysis,00%2C))(按典型3:1輸入輸出比折合約$3/百萬綜合價 ([Mistral Large 2 (Nov '24) - Intelligence, Performance & Price Analysis](https://artificialanalysis.ai/models/mistral-large-2#:~:text=Mistral%20Large%202%20,00%2C)))。研究和非商用途可免費獲取模型權重 ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=However%2C%20it%E2%80%99s%20important%20to%20note,405%20billion%20parameters%2C%20of%20course))。需要商業部署時須獲得付費許可,但相比同級別封閉模型仍價格低廉。 | Mistral聲稱Large 2的綜合實力已達到OpenAI和Meta最新模型水準 ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=Mistral%20released%20a%20fresh%20new,code%20generation%2C%20mathematics%2C%20and%20reasoning)) ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=terms%20of%20code%20generation%2C%20mathematics%2C,and%20reasoning))。在官方提供的基準中,它在代碼和數學任務上甚至跑贏了405B參數的Llama 3.1 ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=cost%20for%20open%20models%2C%20backing,with%20a%20handful%20of%20benchmarks)) ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=Large%202%20appears%20to%20outpace,123%20billion%2C%20to%20be%20precise))。多項評測顯示其推理、知識和創造力均處於第一梯隊,同時由於大幅緩解了不實回答問題 ([Millennium New Horizons](https://www.mnh.vc/blog/mistrals-large-2-is-its-answer-to-meta-and-openais-latest-models#:~:text=be%20precise))而特別適合企業應用。雖不支持多模態,但作為開源旗艦,它被視為業務落地的“Game-Changer” ([Mistral Large 2: A Game-Changer for Business AI Adoption - Medium](https://medium.com/@cognidownunder/mistral-large-2-a-game-changer-for-business-ai-adoption-1502bbdf1948#:~:text=Mistral%20Large%202%3A%20A%20Game,can%20approach%20AI%20adoption))。 **xAI Grok (Beta)** | 馬斯克推出的Grok模型,目前處於Beta測試。以即時響應見長——在測試中平均僅0.3秒即可開始回答 ([Grok Beta - Intelligence, Performance & Price Analysis](https://artificialanalysis.ai/models/grok-beta#:~:text=Analysis%20artificialanalysis,30s))(首字延遲極低)。不過由於模型和基礎設施原因,輸出速度相對一般,僅約66.8 tokens/秒 ([Grok Beta - Intelligence, Performance & Price Analysis](https://artificialanalysis.ai/models/grok-beta#:~:text=Analysis%20artificialanalysis,30s))。總體而言適合短問短答的高速交互,但生成長文檔時不如同級模型迅捷。 | 主要針對文本對話優化。據稱集成了實時網絡訪問能力,可獲取最新的互聯網(尤其是X平台)信息作為輔助手段。未有圖像輸入支持的報道,側重於對話和問答場景。訓練特點是風格活潑幽默,回答往往帶有俏皮語氣。具備編程助手能力,也能夠處理一般常識和知識問答。 | 對簡單提問反應極快,用戶幾乎秒得回覆。對於覆雜或超長問題,有時因模型推理深度或上下文限制,會稍有遲滯,但Beta版總體以速度優先。其架構在短交流中非常敏捷,但長篇生成時66 token/s的速率略顯吃力,需要更長時間才能完成輸出。 | 目前僅提供給X平台部分用戶試用(X Premium++訂閱者)。尚未公布正式API價格和開放使用方案。Beta期間對用戶免費,但有調用上限。未來商業化定價預計會低於OpenAI同級模型,以爭取用戶。 | 官方宣稱Grok 3在學術基準和用戶偏好上達到領先水平,Chatbot Arena Elo評分達1402 ([Grok 3 Beta — The Age of Reasoning Agents](https://x.ai/blog/grok-3#:~:text=Grok%203%20has%20leading%20performance,1402%20in%20the%20Chatbot))。xAI發布演示中,它在某些測試上超越了Gemini 2.0 Pro、DeepSeek V3、GPT-4o等知名模型 ([Grok 3 Technical Review: Everything You Need to Know - Helicone](https://www.helicone.ai/blog/grok-3-benchmark-comparison#:~:text=Grok%203%20Technical%20Review%3A%20Everything,5%20Sonnet))。不過這些結果存爭議,社區質疑其評測公正性 ([Did xAI Cheat? The Truth About Grok-3's Benchmarks! - YouTube](https://www.youtube.com/watch?v=tuUS5a8qnms#:~:text=Did%20xAI%20Cheat%3F%20The%20Truth,video%2C%20we%20break%20down))。獨立測評顯示Grok在長文信息檢索(LOFT)得分83.1%、MMLU-Pro 78.9%等,屬於一流水準但未明顯突破現有最強模型 ([Grok 3 Review: A Critical Look at xAI's 'Smartest AI' Claim. - Medium](https://medium.com/@bernardloki/grok-3-review-a-critical-look-at-xais-smartest-ai-claim-aea15ca38b66#:~:text=Grok%203%20Review%3A%20A%20Critical,This%20suggests%20xAI))。總的來說,Grok表現出色,具有很高的推理能力和良好的知識覆蓋,只是在創造性等方面還有待更多公開驗證。 **Amazon Nova Pro** | AWS的高性能多模態模型,在Bedrock平台上提供業界領先的速度 ([Amazon Nova: Meet our new foundation models in Amazon Bedrock](https://www.aboutamazon.com/news/aws/amazon-nova-artificial-intelligence-bedrock-aws#:~:text=a%20wide%20range%20of%20tasks,intelligence%20classes%20in%20Amazon%20Bedrock))。Nova Micro版本推理速度超過200 token/s ([Amazon Nova - Generative Foundation Model - AWS](https://aws.amazon.com/ai/generative-ai/nova/#:~:text=problem,applications%20that%20require%20fast%20responses));作為更大型的Nova Pro,雖吞吐略低但仍保持百余token每秒的水準。延遲優化出色,在同檔次模型中響應最快 ([Amazon Nova: Meet our new foundation models in Amazon Bedrock](https://www.aboutamazon.com/news/aws/amazon-nova-artificial-intelligence-bedrock-aws#:~:text=a%20wide%20range%20of%20tasks,intelligence%20classes%20in%20Amazon%20Bedrock))。可彈性擴展,支持高RPS企業應用。 | 功能強大的多模態基礎模型,支持文本、圖像和視頻輸入,輸出文本 ([Amazon Nova - Generative Foundation Model - AWS](https://aws.amazon.com/ai/generative-ai/nova/#:~:text=Amazon%20Nova%20Micro%2C%20Amazon%20Nova,speed%2C%20and%20cost%20operation%20points))。能夠理解視頻內容、解析圖表文檔,進行覆雜問答和代碼生成 ([Amazon Nova - Generative Foundation Model - AWS](https://aws.amazon.com/ai/generative-ai/nova/#:~:text=Amazon%20Nova%20Pro%20is%20a,The%20capabilities%20of))。擅長代理式任務(Agentic AI),可執行多步驟工作流 ([Amazon Nova - Generative Foundation Model - AWS](https://aws.amazon.com/ai/generative-ai/nova/#:~:text=Amazon%20Nova%20Pro%2C%20coupled%20with,art%20accuracy%20on%20text))。上下文窗口長達300K tokens ([Benchmarking Amazon Nova and GPT-4o models with FloTorch | AWS Machine Learning Blog](https://aws.amazon.com/blogs/machine-learning/benchmarking-amazon-nova-and-gpt-4o-models-with-flotorch/#:~:text=FloTorch%20used%20the%20GPT,The%20inference)),最大輸出約5000 tokens ([Benchmarking Amazon Nova and GPT-4o models with FloTorch | AWS Machine Learning Blog](https://aws.amazon.com/blogs/machine-learning/benchmarking-amazon-nova-and-gpt-4o-models-with-flotorch/#:~:text=FloTorch%20used%20the%20GPT,API%20calls%20using%20the%20same))。在準確性、速度和成本上實現均衡,適用範圍極廣 ([Amazon Nova - Generative Foundation Model - AWS](https://aws.amazon.com/ai/generative-ai/nova/#:~:text=Amazon%20Nova%20Pro%20is%20a,The%20capabilities%20of))。 | 針對低延遲進行了深度優化。在實際使用中,無論是問答還是覆雜分析,Nova Pro都能以接近實時的速度給出結果。其響應延遲和吞吐均在同級別模型中領先,足以滿足嚴苛的實時交互需求。Nova系列模型在AWS自研Inferentia硬件上運行,進一步確保了穩定快速的響應。 | 計費采用Bedrock按量計費:即時調用時輸入 ~$0.80/百萬,輸出 ~$3.20/百萬 tokens ([[new multi-model] Amazon Nova released just now. - Reddit](https://www.reddit.com/r/singularity/comments/1h5ugjs/new_multimodel_amazon_nova_released_just_now/#:~:text=Amazon%20Nova%20Pro.%20,%241.60));批量異步調用價格減半($0.40/$1.60) ([[new multi-model] Amazon Nova released just now. - Reddit](https://www.reddit.com/r/singularity/comments/1h5ugjs/new_multimodel_amazon_nova_released_just_now/#:~:text=Amazon%20Nova%20Pro.%20,%241.60))。相比同等智能水平的封閉模型(Nova Pro成本僅為它們的25%左右) ([Amazon Nova: Meet our new foundation models in Amazon Bedrock](https://www.aboutamazon.com/news/aws/amazon-nova-artificial-intelligence-bedrock-aws#:~:text=a%20wide%20range%20of%20tasks,intelligence%20classes%20in%20Amazon%20Bedrock))。對於需要處理海量多模態數據的應用,這種價位極具吸引力。 | 具備接近頂尖閉源模型的能力。亞馬遜表示Nova Pro在許多任務上可與Anthropic Claude 3.5等媲美 ([Amazon unveils Nova Pro, its LLM that is on par with Claude 3.5 Sonnet : r/singularity](https://www.reddit.com/r/singularity/comments/1h5ug30/amazon_unveils_nova_pro_its_llm_that_is_on_par/#:~:text=edit%3A%20they%20are%20training%20Nova,performance))。它在視頻摘要、覆雜問答、數學推理、代碼生成等方面表現出色 ([Amazon Nova - Generative Foundation Model - AWS](https://aws.amazon.com/ai/generative-ai/nova/#:~:text=Amazon%20Nova%20Pro%20is%20a,The%20capabilities%20of))。同時在多語種理解和遵循指令上也達到一流水平 ([Amazon Nova - Generative Foundation Model - AWS](https://aws.amazon.com/ai/generative-ai/nova/#:~:text=Amazon%20Nova%20Pro%2C%20coupled%20with,art%20accuracy%20on%20text))。雖然還有更高端的Nova Premier在訓練中,但Nova Pro已經以優秀的準確度和推理能力覆蓋了絕大多數常見場景,成為高性價比的通用AI模型選擇 ([Amazon Nova: Meet our new foundation models in Amazon Bedrock](https://www.aboutamazon.com/news/aws/amazon-nova-artificial-intelligence-bedrock-aws#:~:text=a%20wide%20range%20of%20tasks,intelligence%20classes%20in%20Amazon%20Bedrock))。

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password
    or
    Sign in via Google Sign in via Facebook Sign in via X(Twitter) Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    By signing in, you agree to our terms of service.

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully