# re:Invent 2024 Part 1 AI/ML 新功能發布
Feature name, timestamp, and one paragraph introduction
---
## Accelerated Computing
:::warning
**Feature Name**: [Amazon EC2 Trn2 instances](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-ec2-trn2-instances-available/) ✔️
:::
Announced December 3, 2024
Introduction: AWS has announced the general availability of Amazon Elastic Compute Cloud (Amazon EC2) Trn2 instances and preview of Trn2 UltraServers, powered by AWS Trainium2 chips. Available via EC2 Capacity Blocks, Trn2 instances and UltraServers are the most powerful EC2 compute solutions for deep learning and generative AI training and inference.
:::warning
**Feature Name**: [Amazon EC2 P5en instances](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-ec2-p5en-instances-generative-ai-hpc-generally-available/) ✔️
:::
Announced December 2, 2024
Introduction: AWS announces the general availability of Amazon Elastic Compute Cloud (Amazon EC2) P5en instances, powered by the latest NVIDIA H200 Tensor Core GPUs. These instances deliver the highest performance in Amazon EC2 for deep learning and high performance computing (HPC) applications.
:::warning
**Feature Name**: Amazon SageMaker HyperPod flexible training plans ✔️
:::
Announced December 4, 2024
Introduction: Amazon SageMaker HyperPod has introduced flexible training plans, a new capability that enables users to train generative AI models within their specified timelines and budgets. This feature allows users to achieve predictable model training timelines and run training workloads within budget requirements, while maintaining access to SageMaker HyperPod's core features such as resiliency, performance-optimized distributed training, and enhanced observability and monitoring. Amazon SageMaker HyperPod training plans are now available in US East (N. Virginia), US East (Ohio), US West (Oregon) AWS Regions and support ml.p4d.48xlarge, ml.p5.48xlarge, ml.p5e.48xlarge, ml.p5en.48xlarge, and ml.trn2.48xlarge instances. Trn2 and P5en instances are only in US East (Ohio) Region.
Reference:
- AWS blog: [Meet your training timelines and budgets with new Amazon SageMaker HyperPod flexible training plans](https://aws.amazon.com/blogs/aws/meet-your-training-timelines-and-budgets-with-new-amazon-sagemaker-hyperpod-flexible-training-plans/)
:::success
**Feature Name**: [Amazon SageMaker HyperPod recipes](https://aws.amazon.com/blogs/aws/accelerate-foundation-model-training-and-fine-tuning-with-new-amazon-sagemaker-hyperpod-recipes/)
:::
Announced Dec 4, 2024 ⭐⭐⭐⭐
Introduction: Amazon SageMaker HyperPod recipes help you get started training and fine-tuning publicly available foundation models (FMs) in minutes with state-of-the-art performance. SageMaker HyperPod helps customers scale generative AI model development across hundreds or thousands of AI accelerators with built-in resiliency and performance optimizations, decreasing model training time by up to 40%. However, as FM sizes continue to grow to hundreds of billions of parameters, the process of customizing these models can take weeks of extensive experimenting and debugging. In addition, performing training optimizations to unlock better price performance is often unfeasible for customers, as they often require deep machine learning expertise that could cause further delays in time to market. The supported models include Llama 3.1 405B, Llama 3.2 90B, Mixtral 8x22B, and Mistral 7B.
Reference:
- AWS blog: [Accelerate foundation model training and fine-tuning with new Amazon SageMaker HyperPod recipes](https://aws.amazon.com/blogs/aws/accelerate-foundation-model-training-and-fine-tuning-with-new-amazon-sagemaker-hyperpod-recipes/)
- GitHub: [Amazon SageMaker HyperPod recipes](https://github.com/aws/sagemaker-hyperpod-recipes)
:::success
**Feature Name**: [Task governance for Amazon SageMaker HyperPod](https://aws.amazon.com/blogs/aws/maximize-accelerator-utilization-for-model-development-with-new-amazon-sagemaker-hyperpod-task-governance/)
:::
Announced December 4, 2024
Introduction: Amazon SageMaker HyperPod now provides centralized governance across all generative AI development tasks, such as training and inference. The feature gives users full visibility and control over compute resource allocation, ensuring the most critical tasks are prioritized and maximizing compute resource utilization, which can reduce model development costs by up to 40%.
Reference:
- AWS blog: [Maximize accelerator utilization for model development with new Amazon SageMaker HyperPod task governance](https://aws.amazon.com/blogs/aws/maximize-accelerator-utilization-for-model-development-with-new-amazon-sagemaker-hyperpod-task-governance/)
## Amazon Bedrock
### Models
:::info
**Feature Name**: [Amazon Nova foundation models in Amazon Bedrock](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-nova-foundation-models-bedrock/) ✔️
:::
Announced December 3, 2024 ⭐⭐⭐⭐⭐
小結:性能超越 Titan 家族,**具市場競爭力的定價**,資料格式支援 video。目前 Micro, Lite, and Pro 支援 fine-tunining,基礎模型在簡體中文的表現突出,其他模型陸續會支援 fine-tunining,包含 Canvas and Reel。市場模型性價比比較請見 [artificialanalysis.ai](https://artificialanalysis.ai/)。
Introduction: Amazon Web Services has announced Amazon Nova, a new generation of state-of-the-art (SOTA) foundation models (FMs) that deliver frontier intelligence and industry leading price performance. The announcement introduces five models: Amazon Nova Micro (a text-only model for low-latency responses), Amazon Nova Lite (a low-cost multimodal model for image, video, and text), Amazon Nova Pro (a highly capable multimodal model), Amazon Nova Canvas (for image generation), and Amazon Nova Reel (for video generation). These models are available through Amazon Bedrock and offer various capabilities optimized for RAG and agentic applications.
Reference
- AWS blog: [Introducing Amazon Nova: Frontier intelligence and industry leading price performance](https://aws.amazon.com/blogs/aws/introducing-amazon-nova-frontier-intelligence-and-industry-leading-price-performance/)
- AWS Science: [Amazon Nova Reel examples ](https://www.amazon.science/blog/amazon-nova-reel-examples)
- GitHub AWS samples: [Amazon Nova model cookbook](https://github.com/aws-samples/amazon-nova-samples)
:::warning
**Feature Name**: [Amazon Bedrock Marketplace](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-bedrock-marketplace-100-models-bedrock/) +100 Models ✔️
:::
Announced December 4, 2024 ⭐⭐⭐⭐⭐
小結:使用 Bedrock API 調用超過 +100 模型,像是 [arcee.ai](https://www.arcee.ai/slm) 的 SLMs。
Introduction: Amazon Bedrock Marketplace provides generative AI developers access to over 100 publicly available and proprietary foundation models (FMs), in addition to Amazon Bedrock's industry-leading, serverless models. Customers deploy these models onto SageMaker endpoints where they can select their desired number of instances and instance types. Amazon Bedrock Marketplace models can be accessed through Bedrock's unified APIs, and models which are compatible with Bedrock's Converse APIs can be used with Amazon Bedrock's tools such as Agents, Knowledge Bases, and Guardrails.
:::spoiler 北美市場趨勢
**Bedrock Marketplace 的模型供應商:**
- AI21 Labs
- Amazon
- Anthropic
- Arcee AI (5個模型)
- Camb.ai (1個模型)
- Cohere
- EvolutionaryScale, PBC (1個模型)
- Gretel (1個模型)
- HuggingFace (83個模型)
- IBM Data and AI (6個模型)
- John Snow Labs (3個模型)
- Karakuri, Inc. (1個模型)
- LG CNS (1個模型)
- Liquidai (3個模型)
- Meta
- Mistralai
- NCSoft (2個模型)
- NVIDIA (1個模型)
- Preferred Networks, Inc. (1個模型)
- Stability AI (1個模型)
**熱門模型:**
- Writer: Writer Palmyra-Med-70B-32K, - Writer Palmyra-Fin-70B-32K
- Widn: Widn Tower Sugarloaf, Widn Tower Anthill, Widn Llama3-Tower Vesuvius
- Solar: Solar Pro, Solar Pro - Quant, Solar Mini Chat, Solar Mini Chat - Quant, Solar Mini Chat ja, Solar Mini Chat ja - Quant
- Stockmark: Stockmark-LLM-13b
- Stable Diffusion: Stable Diffusion 3.5 Large
- PLaMo: PLaMo API
- NVIDIA: NVIDIA Nemotron-4 15B NIM Microservice
- Liquid: Liquid LFM 40B (L40S), Liquid LFM 40B (H100), Liquid LFM 40B (A100)
- KARAKURI: KARAKURI LM 8x7b instruct
- IBM: IBM Granite 8B Code Instruct - 128K, IBM Granite 3B Code Instruct - 128K, IBM Granite 34B Code Instruct - 8K, IBM Granite 20B Code Instruct - 8K
**關於 Arcee.ai**
Arcee.ai是一家專注於小型語言模型(SLM)開發的美國公司,核心產品為 SuperNova 模型:
產品特色:
- 700億參數的企業級語言模型
- 可在企業自有基礎設施部署
- 提供完整的客製化選項
- 首先在AWS Marketplace上線
:::
### Inference API
:::danger
**Feature Name**: [Latency-optimized inference for foundation models in Amazon](https://aws.amazon.com/about-aws/whats-new/2024/12/latency-optimized-inference-foundation-models-amazon-bedrock/) (Public Preview) 🚨
:::
Announced December 2, 2024 ⭐⭐⭐
小結:支援 Anthropic's Claude 3.5 Haiku, Meta's Llama 3.1 405B 以及 70B 模型。請注意目前僅在 Ohio 區域透過 CRIS 提供推理服務。 Amazon Bedrock 推出了針對基礎模型的延遲優化推理功能(Latency-optimized inference),目前處於公開預覽階段。
Introduction: Latency-optimized inference for foundation models in Amazon Bedrock is now available in public preview, delivering faster response times and improved responsiveness for AI applications. The new inference options support Anthropic's Claude 3.5 Haiku model and Meta's Llama 3.1 405B and 70B models, offering reduced latency compared to standard models without compromising accuracy. As verified by Anthropic, Claude 3.5 Haiku runs faster on AWS than anywhere else, and Llama 3.1 405B and 70B runs faster on AWS than any other major cloud provider.
:::spoiler 重點總結
支援的模型包括:
- Anthropic 的 Claude 3.5 Haiku
- Meta 的 Llama 3.1 405B 和 70B 模型
主要優勢:
- 相比標準模型,可降低延遲時間
- 不影響準確性
- Claude 3.5 Haiku 在 AWS 上運行速度比其他平台更快
- Llama 3.1 在 AWS 上比其他主要雲端供應商運行更快
技術特點:
- 使用 AWS Trainium2 AI 晶片
- 在 Amazon Bedrock 中進行軟體優化
- 無需額外設置
- 可立即提升現有應用程式的回應時間
可用性:
- 在美國東部(俄亥俄)區域通過跨區域推理提供服務
:::
:::danger
**Feature Name**: [Prompt Caching (gated preview)](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-bedrock-preview-prompt-caching/) 🚨
:::
Announced December 4, 2024 ⭐⭐⭐⭐
Introduction: Amazon Bedrock has announced prompt caching, a new capability that can reduce costs by up to 90% and latency by up to 85% for supported models by caching frequently used prompts across multiple API calls. This feature allows caching of repetitive inputs and avoids reprocessing context, such as long system prompts and common examples that help guide the model's response. When cache is used, fewer computing resources are needed to generate output, resulting in faster processing and cost savings from using fewer resources.
:::spoiler 概念及重點整理

Source: [Paper](https://arxiv.org/pdf/2311.04934)
- **服務為受限預覽,目前不適用於生產工作負載**
- 支援的模型:Nova Micro, Lite, and Pro 以及 Claude 3.5 series
:::
Reference
- AWS blog: [Reduce costs and latency with Amazon Bedrock Intelligent Prompt Routing and prompt caching (preview)](https://aws.amazon.com/blogs/aws/reduce-costs-and-latency-with-amazon-bedrock-intelligent-prompt-routing-and-prompt-caching-preview/)
:::info
**Feature Name**: [Amazon Bedrock Intelligent Prompt Routing (preview)](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-bedrock-intelligent-prompt-routing-preview/) ✅
:::
Announced December 4, 2024
小結:幫客戶自動省荷包。
Introduction: Amazon Bedrock Intelligent Prompt Routing is a new feature that routes prompts to different foundational models within a model family, optimizing for both response quality and cost. The system uses advanced prompt matching and model understanding techniques to predict each model's performance for specific requests and dynamically routes requests to the most suitable model that can provide the desired response at the lowest cost. During preview, customers can choose from two prompt routers that route requests either between Claude Sonnet 3.5 and Claude Haiku, or between Llama 3.1 8B and Llama 3.1 70B.
:::spol
Reference
- AWS blog: [Reduce costs and latency with Amazon Bedrock Intelligent Prompt Routing and prompt caching (preview)](https://aws.amazon.com/blogs/aws/reduce-costs-and-latency-with-amazon-bedrock-intelligent-prompt-routing-and-prompt-caching-preview/)
### Guardrails and Responsible AI
:::info
**Feature Name**: [Amazon Bedrock Guardrails Multimodal Toxicity Detection (preview)](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-bedrock-guardrails-multimodal-toxicity-detection-image-content-preview/) ✅
:::
Announced December 4, 2024
小結:針對有害、侮辱、色情、暴力四個面向進行圖片審查,但目前觀察是 Claude 3.5 Sonnet 本身的內容審核更加嚴格。
Introduction: Amazon Bedrock Guardrails now supports multimodal toxicity detection for image content in public preview, enabling organizations to apply content filters to images. This new capability removes the heavy lifting required by customers to build their own safeguards for image data or spend cycles with manual evaluation. The solution offers comprehensive detection and filtration of undesirable and potentially harmful image content while retaining safe and relevant visuals, allowing customers to use content filters for both text and image data with configurable thresholds across categories such as hate, insults, sexual, and violen
Reference
- AWS blog: [Amazon Bedrock Guardrails now supports multimodal toxicity detection with image support (preview)](https://aws.amazon.com/blogs/aws/amazon-bedrock-guardrails-now-supports-multimodal-toxicity-detection-with-image-support/)
:::danger
**Feature Name**: [Amazon Bedrock Guardrails Automated Reasoning checks (Preview)](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-bedrock-guardrails-automated-reasoning-checks-preview/) 🚨
:::
Announced December 3, 2024 ⭐⭐⭐
小結:稍安勿躁、靜觀其變,此功能需要客戶針對其商業場景限定使用範圍和規則,需要客戶的人力釐清並統整商業問題,適用在有明確法規和規則的場景,像是機票人名開票後不可變更、保險理賠範圍、貸款申請條件等等。
Introduction: With the launch of the Automated Reasoning checks safeguard in Amazon Bedrock Guardrails, AWS becomes the first and only major cloud provider to integrate automated reasoning in their generative AI offerings. Automated Reasoning checks help detect hallucinations and provide a verifiable proof that a large language model (LLM) response is accurate. Automated Reasoning tools are not guessing or predicting accuracy. Instead, they rely on sound mathematical techniques to definitively verify compliance with expert-created Automated Reasoning Policies, consequently improving transparency.
:::spoiler 主要重點

AWS推出新的 Automated Reasoning checks(預覽版)功能,整合在Amazon Bedrock Guardrails中,用於驗證大型語言模型(LLM)回應的準確性,防止幻覺產生的事實錯誤。
主要特點:
- 使用數學和邏輯演算法來驗證模型生成的信息
- 是唯一一個使用數學驗證方法來防止LLM幻覺的主要雲服務提供商
- 特別適用於需要事實準確性和可解釋性的場景,如人力資源政策、公司產品信息等
使用流程:
- 創建 Automated Reasoning policies(上傳文檔,系統自動分析並創建政策)
- 配置 guardrail 並啟用 Automated Reasoning checks
- 使用 Test playground 測試驗證效果
驗證結果分類:
- Valid: 無事實錯誤
- Invalid: 存在事實錯誤
- Mixed results: 存在事實不一致
目前狀態:
- 在AWS US West (Oregon) 區域提供預覽版
- **需要通過AWS Account Team 申請訪問權限**
- **未來幾週將在 Amazon Bedrock 控制台提供註冊表單**
:::
Reference:
- AWS blog: [Prevent factual errors from LLM hallucinations with mathematically sound Automated Reasoning checks (preview)](https://aws.amazon.com/blogs/aws/prevent-factual-errors-from-llm-hallucinations-with-mathematically-sound-automated-reasoning-checks-preview/)
- AWS Science: [Automated reasoning](https://www.amazon.science/research-areas/automated-reasoning)
### Finetune
:::info
**Feature Name**: [Amazon Bedrock Model Distillation](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-bedrock-model-distillation-preview/)
:::
Announced December 3, 2024
小結:針對常見的問題和任務,利用大模型的輸出來訓練小模型,達到降本增效。重點在於大量生成合成資料。
Amazon Bedrock Model Distillation, now available in preview, enables customers to utilize smaller, faster, and more cost-effective models while maintaining use-case specific accuracy comparable to the most capable models in Amazon Bedrock. This new capability automates the process of generating synthetic data from the teacher model, training and evaluating the student model, and hosting the final distilled model for inference, eliminating the traditional iterative process of manual fine-tuning that requires writing prompts and responses, refining datasets, and adjusting training parameters.
:::spoiler 🚨 請注意資料格式

```
{
"schemaVersion": "bedrock-conversation-2024",
"system": [
{
"text": "A chat between a curious User and an artificial intelligence Bot. The Bot gives helpful, detailed, and polite answers to the User's questions."
}
],
"messages": [
{
"role": "user",
"content": [
{
"text": "why is the sky blue"
}
]
},
{
"role": "assistant"
"content": [
{
"text": "The sky is blue because molecules in the air scatter blue light from the Sun more than other colors."
}
]
}
]
}
```
:::
Referene:
- AWS blog: [Build faster, more cost-efficient, highly accurate models with Amazon Bedrock Model Distillation (preview)](https://aws.amazon.com/blogs/aws/build-faster-more-cost-efficient-highly-accurate-models-with-amazon-bedrock-model-distillation-preview/)
- Official document: [Prerequisites for Amazon Bedrock Model Distillation](https://docs.aws.amazon.com/bedrock/latest/userguide/prequisites-model-distillation.html)
### LLMOps
:::info
**Feature Name**: [Amazon Bedrock Prompt Management is now generally available](https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-bedrock-prompt-management-available/) ✅
:::
Announced November 7, 2024
:::info
**Feature Name**: [Introducing Prompt Optimization in Preview in Amazon Bedrock](https://aws.amazon.com/about-aws/whats-new/2024/11/prompt-optimization-preview-amazon-bedrock/) ✅
:::
Announced Nov 21, 2024
Introduction: Amazon Bedrock has announced the preview launch of Prompt Optimization, a new feature that rewrites prompts for higher quality responses from foundational models. This feature allows developers to optimize their prompts for improved performance across multiple models including Claude Sonnet 3.5, Claude Sonnet, Claude Opus, Claude Haiku, Llama 3 70B, Llama 3.1 70B, Mistral Large 2 and Titan Text Premier models, with the ability to compare performance against original prompts without any deployment needed.
:::info
**Feature Name**: [Amazon Bedrock Flows (previously known as Prompt Flows)](https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-bedrock-flows-new-capabilities/) ✅
:::
Announced Nov 22, 2024
Introduction: Amazon Bedrock Flows has reached general availability and introduces two new key capabilities. This service enables users to connect foundation models, Prompts, Agents, Knowledge Base and other AWS services together using an intuitive visual builder to accelerate the creation and execution of generative AI workflows. The new capabilities include real-time visibility into workflow execution and safeguards with Amazon Bedrock Guardrails.
### Agent

Source:
- AWS Labs GitHub: [Amazon Bedrock Agents Samples](https://github.com/awslabs/amazon-bedrock-agent-samples)
:::spoiler Agent API 說明
可自行參考 [Official document](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_Agent.html#bedrock-Type-agent_Agent-idleSessionTTLInSeconds) (請注意官方網站的內容會逐步更新)
```jsonl
{
"agentCollaboration": "string", # SUPERVISOR | SUPERVISOR_ROUTER | DISABLED
"agentName": "string",
"agentResourceRoleArn": "string",
"clientToken": "string", # [33, 256]
"customerEncryptionKeyArn": "string",
"customOrchestration": {
"executor": { ... } # lambda
},
"description": "string",
"foundationModel": "string",
"guardrailConfiguration": {
"guardrailIdentifier": "string",
"guardrailVersion": "string"
},
"idleSessionTTLInSeconds": number, # [60, 3600]
"instruction": "string",
"memoryConfiguration": {
"enabledMemoryTypes": [ "string" ],
"storageDays": number
},
"orchestrationType": "string", # DEFAULT | CUSTOM_ORCHESTRATION
"promptOverrideConfiguration": {
"overrideLambda": "string",
"promptConfigurations": [
{
"basePromptTemplate": "string",
"foundationModel": "string",
"inferenceConfiguration": {
"maximumLength": number,
"stopSequences": [ "string" ],
"temperature": number,
"topK": number,
"topP": number
},
"parserMode": "string",
"promptCreationMode": "string",
"promptState": "string",
"promptType": "string" # "ROUTING_CLASSIFIER"
}
]
},
"tags": {
"string" : "string"
}
}
```
Reference:
- [clientToken](https://docs.aws.amazon.com/ec2/latest/devguide/ec2-api-idempotency.html)
- [MemoryConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_MemoryConfiguration.html)
:::
:::spoiler Agent 執行設計說明
Sequential calling
```
# More prompt content goes here...
Please perform the following tasks sequentially. Be sure you do not
perform any of them in parallel. If a task will require information produced from a prior task,
be sure to include the details as input to the task.
```
Parallel calling
```
# More prompt content goes here...
Please perform as many of the following tasks in parallel where possible.
When a dependency between tasks is clear, execute those tasks in sequential order.
If a task will require information produced from a prior task,
be sure to include the details as input to the task.
```
:::
:::info
**Feature Name**: [Amazon Bedrock Agents Conversational Builder](https://aws.amazon.com/about-aws/whats-new/2024/10/amazon-bedrock-agents-conversational-builder/) ✅
:::
Announced October 16, 2024 ⭐⭐⭐
Introduction: AWS announces the general availability of Conversational Builder for Amazon Bedrock Agents, which provides a chat interface for building Bedrock Agents. This new feature offers an alternative experience to traditional manual configuration methods, allowing users to describe what they want their agent to do through natural language instructions. The Conversational Builder automatically generates the necessary configurations, reducing the time needed for agent creation and prototyping process.
Reference:
- Official document: [Create and configure agent using conversational builder](https://docs.aws.amazon.com/bedrock/latest/userguide/agents-create-cb.html)
:::info
**Feature Name**: [Amazon Bedrock Multi-Agent Collaboration (preview)](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-bedrock-multi-agent-collaboration/) ✅
:::
Announced December 3, 2024 ⭐⭐⭐⭐⭐
小結:
Amazon Bedrock has introduced multi-agent collaboration support, enabling organizations to build and manage multiple AI agents that work together to solve complex workflows. This new feature allows developers to create specialized agents for specific business needs, such as financial data collection, research, and decision-making, empowering organizations to optimize performance across various industries including finance, customer service, and healthcare through seamless agent collaboration that delivers highly accurate and scalable results.
:::success
**Feature Name**: [InlineAgents for Agents for Amazon Bedrock](https://aws.amazon.com/about-aws/whats-new/2024/11/inlineagents-agents-amazon-bedrock/)
:::
Announced November 25, 2024
小結:可以動態配置代理的功能,包括基礎模型、指令、動作群組、護欄和知識庫,適合快速驗證和測試代理功能。目前僅支援 SDK。模型可以選擇平台上所有支援 function calling 的大語言模型,只是目前僅針對 Nova 和 Claude 調優過代理提示詞。
Introduction: Agents for Amazon Bedrock has launched InlineAgents, a new feature that enables developers to define and configure Bedrock Agents dynamically at runtime. This enhancement provides developers with greater flexibility and control over agent capabilities, allowing them to specify foundation models, instructions, action groups, guardrails, and knowledge bases on-the-fly without relying on pre-configured control plane settings.
:::spoiler 兩種建構和調用代理的方式
| 傳統方式 (Traditional) | 內聯方式 (Inline) |
|----------------------|------------------|
| 預先創建資源 - 創建代理然後執行它 | 程式運行時動態調用工具 - 代理的定義和執行同時進行 |
| 對代理範圍有更好的控制 | 代理範圍具有更大的靈活性 |
| | 資源在 AWS 帳戶中不是持久性的 |
| 代理資源附帶明確定義的角色權限 | 安全性基於應用程式的執行角色權限 |
| 更好的治理和可審計性 | 可審計性僅基於調用日誌 |
| 在 AWS 控制台中可見代理 | 治理實作在代理使用的工具上(知識庫、Lambda函數、防護欄等)|
| | 適合像是 ISV 通過雲端平台(PaaS)發布和銷售他們的解決方案(SaaS) |
:::
Reference:
- Amazon Bedrock Recipes: [Create Dynamic Tooling Inline Agents](https://aws-samples.github.io/amazon-bedrock-samples/agents-and-function-calling/bedrock-agents/features-examples/15-invoke-inline-agents/inline-agent-api-usage/)
- AWS samples: [Building Dynamic AI Assistants with Amazon Bedrock Inline Agents](https://github.com/aws-samples/amazon-bedrock-samples/blob/main/agents-and-function-calling/bedrock-agents/features-examples/15-invoke-inline-agents/inline-agent-api-usage.ipynb)
:::success
**Feature Name**: [Amazon Bedrock Agents Custom Orchestration](https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-bedrock-agents-custom-orchestration/)
:::
Announced November 27, 2024
小結:可自行使用 Lambda 控制 Orchestration Flow。
Introduction: Amazon Bedrock Agents now supports custom orchestration, allowing developers to control how agents handle multistep tasks, make decisions, and execute complex workflows. This capability enables developers to define custom orchestration logic for their agents using AWS Lambda, providing them flexibility to tailor agent's behavior to fit specific use cases.
:::spoiler Orchestration 類型
- ReAct

- [Plan and Solve](https://arxiv.org/pdf/2305.04091)
- Tree of Thought
- Standard Operating Procedures (SOP)
(More...)
:::
:::info
**Feature Name**: [Amazon Bedrock Agents streaming responses](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-bedrock-knowledge-bases-streaming-retrieveandgeneratestream-api/) ✅
:::
### Model Evaluation
:::info
**Feature Name**: [Amazon Bedrock Model Evaluation - LLM-as-a-judge (Preview)](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-bedrock-model-evaluation-llm-as-a-judge-preview/) ✅
:::
Announced Dec 1, 2024 ⭐⭐⭐
小結:以大語言模型取代人工評測。
Introduction: Amazon Bedrock Model Evaluation has introduced a new evaluation capability called LLM-as-a-judge in Preview, which allows users to evaluate, compare, and select the best foundation models for their use cases. This new feature enables users to choose an LLM as their judge to ensure the right combination of evaluator models and models being evaluated, with access to several judge LLMs on Amazon Bedrock. Users can select curated quality metrics such as correctness, completeness, professional style and tone, as well as responsible AI metrics including harmfulness and answer refusal, while also having the ability to bring their own prompt dataset for customized evaluation.
Reference:
- AWS blog: [New RAG evaluation and LLM-as-a-judge capabilities in Amazon Bedrock](https://aws.amazon.com/blogs/aws/new-rag-evaluation-and-llm-as-a-judge-capabilities-in-amazon-bedrock/)
- Cohere papar: [Replacing Judges with Juries:
Evaluating LLM Generations with a Panel of Diverse Models](https://arxiv.org/pdf/2404.18796)
### Knowledge Base

:::info
**Feature Name**: [Amazon Bedrock Knowledge Bases RAG evaluation](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-bedrock-knowledge-bases-rag-evaluation-preview/) ✅
:::
Announced December 1, 2024
小結:採用批次推論,評估 RAG 系統的優劣。
Introduction: Amazon Bedrock Knowledge Bases now supports RAG evaluation in Preview, allowing customers to evaluate their retrieval-augmented generation (RAG) applications. Users can evaluate either information retrieval or the retrieval plus content generation using LLM-as-a-Judge technology, with a choice of several judge models. The evaluation includes metrics such as context relevance and coverage for retrieval, and quality metrics like correctness, completeness, faithfulness, as well as responsible AI metrics including harmfulness, answer refusal, and stereotyping for retrieve plus generation evaluation.
:::spoiler 評估指標
使用 LLM-as-a-judge 進行評估。
**評估搜尋結果使用 Retrieve API**
Context Relevance(上下文相關性)
計算方法:
- 使用LLM判斷每個檢索到的上下文與原始問題的相關性(是/否)
- 計算平均精確度(Mean Average Precision, mAP)
- 分數範圍在0到1之間,分數越高表示性能越好
Context Coverage/Recall(上下文覆蓋率)
計算方法:
- 將Ground Truth拆分成句子
- 使用LLM判斷每個Ground Truth句子與檢索到的上下文的關係(是/否)
- 計算公式:與上下文相關的Ground Truth句子數量 / Ground Truth句子總數
- 分數範圍在0到1之間,分數越高表示覆蓋率越好7
**評估搜尋與生成結果使用 Retrieve and Generation API**
針對檢索和生成類型的評估,有8個指標用來評估知識庫生成有用且適當回應的能力:
正確性 (Correctness)
- 衡量回答問題的準確程度
- 分數越高表示回應平均來說越正確
完整性 (Completeness)
- 衡量回應是否涵蓋問題的所有方面
- 分數越高表示回應平均來說越完整
有用性 (Helpfulness)
- 衡量回應是否整體上有用
- 分數越高表示回應平均來說越有幫助
邏輯連貫性 (Logical coherence)
- 衡量回應是否沒有邏輯缺口、不一致或矛盾
- 分數越高表示回應平均來說越連貫
忠實度 (Faithfulness)
- 衡量回應是否避免與檢索的文本片段產生幻覺
- 分數越高表示回應平均來說越忠實於原文
危害性 (Harmfulness)
- 衡量回應是否包含仇恨、侮辱或暴力言論
- 分數越高表示回應平均來說越具危害性(這是不理想的)
刻板印象 (Stereotyping)
- 衡量回應是否對個人或群體做出概括性陳述
- 分數越高表示回應平均來說包含更多刻板印象(這是不理想的)
- 正面和負面的刻板印象都會導致高分
迴避性 (Refusal)
- 衡量回應是否迴避問題
- 分數越高表示回應平均來說越具迴避性(這是不理想的)

:::
Reference:
- AWS blog: [New RAG evaluation and LLM-as-a-judge capabilities in Amazon Bedrock](https://aws.amazon.com/blogs/aws/new-rag-evaluation-and-llm-as-a-judge-capabilities-in-amazon-bedrock/)
- Official document: [Review metrics for knowledge base evaluations that use LLMs (console)](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-eval-llm-results.html)
:::info
**Feature Name**: [Amazon Bedrock Knowledge Bases auto-generated query filters](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-bedrock-knowledge-bases-auto-generated-query-filters-improved-retrieval/)
:::
Announced Dec 1, 2024 ⭐⭐⭐
小結:根據使用者問題,「自動」提取 API 內設定的 metadata key's value。在 API 內的名稱為 implicitFilterConfiguration。
Introduction: Amazon Bedrock Knowledge Bases offers fully-managed, end-to-end Retrieval-Augmented Generation (RAG) workflows to create highly accurate, low latency, secure, and custom GenAI applications by incorporating contextual information from your data sources. Today, they are launching automatically-generated query filters which improves retrieval accuracy by ensuring the documents retrieved are relevant to the query. This feature extends the existing capability of manual metadata filtering, by allowing customers to narrow down search results without the need to manually construct complex filter expressions.
:::spoiler API 程式碼樣本
```python
def retrieve_and_generate_autofiler(query, kb_id, model_arn, max_results, prompt_template = default_prompt):
response = bedrock_agent_client.retrieve_and_generate(
input={
'text': query
},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': kb_id,
'modelArn': model_arn,
'retrievalConfiguration': {
'vectorSearchConfiguration': {
'numberOfResults': max_results, # will fetch top N documents which closely match the query
"implicitFilterConfiguration": { # 從這裏開始設定 implicit filter
"metadataAttributes": [
{
"description": "retrieve docs from clean sources",
"key": "x-amz-bedrock-kb-data-source-id",
"type": "STRING"
}
],
"modelArn": model_arn
},
},
},
'generationConfiguration': {
'promptTemplate': {
'textPromptTemplate': prompt_template
}
}
}
}
)
return response
kb_id = "KB_ID"
query = """幫我介紹關於 knowledge base 的相關功能 請幫我查找來自 GNAZASM8HN 的資料源"""
model_arn = "anthropic.claude-3-5-sonnet-20241022-v:0"
results = retrieve_and_generate(query = query, kb_id = kb_id, model_arn = model_arn, max_results = 5)
```
:::
Reference:
- Official document: [Implicit metadata filtering](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test-config.html?icmpid=docs_bedrock_help_panel_knowledge_base)
:::info
**Feature Name**: [Amazon Bedrock Rerank API](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-bedrock-rerank-api-accuracy-rag-applications/) ✅
:::
Announced December 1, 2024 ⭐⭐⭐
小結:有獨立的 API。目前支援 Amazon Rerank 1.0 以及 Cohere Rerank 3.5 models。
Introduction: Amazon Bedrock has introduced support for reranker models through the Rerank API to enhance the relevance of responses in Retrieval-Augmented Generation (RAG) applications. The feature helps prioritize the most relevant content for foundation models by ranking retrieved documents based on their relevance to user queries. This addresses challenges in semantic search where complex or ambiguous queries might retrieve mixed results, such as a customer service chatbot retrieving both return policies and shipping guidelines for a return-related query. The Rerank API supports Amazon Rerank 1.0 and Cohere Rerank 3.5 models.
Reference:
- Official document: [Improve the relevance of query responses with a reranker model in Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/rerank.html)
:::success
**Feature Name**: [Amazon Bedrock Knowledge Bases custom connectors and streaming data ingestion](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-connect-simplified-conversational-ai-bot-creation/)
:::
Announced December 1, 2024
Introduction: Amazon Bedrock Knowledge Bases now supports custom connector and ingestion of streaming data, allowing developers to add, update, or delete data in their knowledge base through direct API calls. Amazon Bedrock Knowledge Bases offers fully-managed, end-to-end Retrieval-Augmented Generation (RAG) workflows to create highly accurate, low latency, secure, and custom GenAI applications by incorporating contextual information from your company's data sources. With this new capability, customers can easily ingest specific documents from custom data sources or Amazon S3 without requiring a full sync, and ingest streaming data without the need for intermediary storage.
:::success
**Feature Name**: [Binary vector embeddings support in Amazon Bedrock Knowledge Bases](https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-bedrock-knowledge-bases-binary-vector-embeddings-rag-applications/)
:::
Announced November 22, 2024
Introduction: Amazon Bedrock Knowledge Bases has introduced support for binary vector embeddings in building Retrieval Augmented Generation (RAG) applications, compatible with Titan Text Embeddings V2 model and Cohere Embed models. This new capability enables organizations to create highly accurate, low-latency, and secure RAG applications by representing document embeddings as binary vectors (0 or 1), offering significant advantages in storage efficiency, computational speed, and scalability. The feature is currently supported with Amazon OpenSearch Serverless as vector store and is available in all Amazon Bedrock Knowledge Bases regions where Amazon OpenSearch Serverless and the compatible embedding models are accessible.
Reference:
- [Introducing Binary Embeddings for Titan Text Embeddings model in Amazon Bedrock](https://aws.amazon.com/about-aws/whats-new/2024/11/binary-embeddings-titan-text-embeddings-model-amazon-bedrock/)
- [Supported models and regions for Amazon Bedrock knowledge bases](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-supported.html)
:::info
**Feature Name**: [Amazon Bedrock Knowledge Bases now provides option to stop ingestion jobs](https://aws.amazon.com/about-aws/whats-new/2024/10/amazon-bedrock-knowledge-bases-stop-ingestion-jobs/) ✅
:::
Announce Oct 1, 2024
:::info
**Feature Name**: [Amazon Bedrock Knowledge Bases streaming responses](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-bedrock-knowledge-bases-streaming-retrieveandgeneratestream-api/) ✅
:::
Announced December 1, 2024 ⭐⭐⭐
Introduction: Amazon Bedrock Knowledge Bases now offers support for RetrieveAndGenerateStream API, enabling customers to receive responses as they are being generated by the Large Language Model (LLM), rather than waiting for the complete response.
:::success
**Feature Name**: [Structured Data Retrieval support in Amazon Bedrock Knowledge Bases](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-bedrock-knowledge-bases-structured-data-retrieval/)
:::
Announced December 4, 2024 ⭐⭐⭐⭐⭐
小結:生成SQL 語法,支援 SageMaker Lakehouse, Redshift, and S3 Tables (Iceberg)。
Introduction: Amazon Bedrock Knowledge Bases now supports natural language querying to retrieve structured data from data sources, offering an end-to-end managed workflow for customers to build custom generative AI applications that can access and incorporate contextual information from various structured and unstructured data sources. Using advanced natural language processing, Bedrock Knowledge Bases can transform natural language queries into SQL queries, allowing users to retrieve data directly from the source without the need to move or preprocess the data.
:::info
**Feature Name**: [GraphRAG support in Amazon Bedrock Knowledge Bases (preview)](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-bedrock-knowledge-bases-graphrag-preview/) ✅
:::
Announced December 4, 2024 ⭐⭐⭐⭐⭐
Introduction: Amazon Bedrock Knowledge Bases has announced support for GraphRAG, a new capability that enhances Generative AI applications by providing more comprehensive, relevant and explainable responses using RAG techniques combined with graph data. This fully-managed capability integrates with Amazon Neptune Analytics to offer end-to-end Retrieval-Augmented Generation (RAG) workflows, enabling the creation of highly accurate, low latency, and custom Generative AI applications by incorporating contextual information from company's data sources. Previously, customers faced challenges in performing comprehensive searches across different content sources. GraphRAG addresses this by identifying key entities across documents and leveraging relationships within the data, which enables improved responses to end users.
:::spoiler 適用場景
:::
Reference
- AWS blog: [New Amazon Bedrock capabilities enhance data processing and retrieval](https://aws.amazon.com/blogs/aws/new-amazon-bedrock-capabilities-enhance-data-processing-and-retrieval/)
:::danger
**Feature Name**: [Amazon Bedrock Knowledge Bases Multimodal Data Processing](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-bedrock-knowledge-bases-processes-multimodal-data/) ✅
:::
Announced December 4, 2024
小結:待釐清是使用 multimodal LLM model or multimodal embedding model,目前看起來是先進行 filter 再進行語意搜尋 🚨 目前可用的最強大的模型是 Claude 3.5 Sonnet v1。會發生不能自行排除的錯誤如下:

Introduction: Amazon Bedrock Knowledge Bases now enables developers to build generative AI applications that can analyze and leverage insights from both textual and visual data, such as images, charts, diagrams, and tables. Bedrock Knowledge Bases offers end-to-end managed Retrieval-Augmented Generation (RAG) workflow that enables customers to create highly accurate, low-latency, secure, and custom generative AI applications by incorporating contextual information from their own data sources. With this launch, Bedrock Knowledge Bases extracts content from both text and visual data, generates semantic embeddings using the selected embedding model, and stores them in the chosen vector store.
:::spoiler 文件類型支持表格
純文本(.txt):三種解析器都支持
Markdown(.md):三種解析器都支持
HTML(.html):三種解析器都支持
Word文檔(.doc/.docx):三種解析器都支持
CSV文件(.csv):三種解析器都支持
Excel表格(.xls/.xlsx):三種解析器都支持
PDF文件(.pdf):三種解析器都支持
圖像文件(.jpeg/.png):僅支持數據自動化和基礎模型解析器
:::
Reference
- AWS blog: [New Amazon Bedrock capabilities enhance data processing and retrieval](https://aws.amazon.com/blogs/aws/new-amazon-bedrock-capabilities-enhance-data-processing-and-retrieval/)
- Official document: [Parsing options for your data source](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-advanced-parsing.html)
:::danger
**Feature Name**: Amazon Bedrock Data Automation (BDA) (preview) 🚨
:::
Announced Dec 4, 2024
小結:支援四種類型的檔案:DOCUMENT/ VIDEO/ AUDIO/ IMAGE。可以針對單一檔案批次處理或是專案(Project)建立 BDA 模板。待釐清圖文交錯的檔案,是否採用混合方法,目前看起來是採用混合方式 🚨 **不能支援中文、會發生語言阻擋**
Introduction: Amazon Bedrock Data Automation (BDA) is a new feature of Amazon Bedrock announced in preview that enables developers to automate the generation of valuable insights from unstructured multimodal content such as documents, images, video, and audio to build GenAI-based applications. These insights include video summaries of key moments, detection of inappropriate image content, automated analysis of complex documents, and more. Developers can also customize BDA's output to generate specific insights in consistent formats required by their systems and applications.
:::spoiler 關於 BDA Project 的重點整理
檔案匯入進行 OCR 前,會另存成 PNG 圖片格式,**需要指定輸出的 S3 位置**。
文件模式:

圖片模式:

影片模式:

音訊模式:

:::
Reference
- AWS blog: [New Amazon Bedrock capabilities enhance data processing and retrieval](https://aws.amazon.com/blogs/aws/new-amazon-bedrock-capabilities-enhance-data-processing-and-retrieval/)
- Official document: [Data automation](https://docs.aws.amazon.com/bedrock/latest/userguide/bda.html)
:::success
**Feature Name**: [Amazon Bedrock Cost Allocation Tags for Inference Profiles](https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-bedrock-cost-allocation-tags-inference-profiles/)
:::
Announced November 1, 2024 ⭐⭐⭐⭐⭐
Introduction: Amazon Bedrock has introduced support for cost allocation tags on inference profiles, enabling customers to allocate and track their on-demand foundation model usage. This new feature allows customers to categorize their GenAI inference costs by department, team, or application using AWS cost allocation tags by creating an application inference profile and tagging it. As part of Amazon Bedrock's fully managed service offering high-performing foundation models via a single API, this capability enhances cost management while maintaining the platform's core features of security, privacy, and responsible AI capabilities.
## Amazon Connect
:::warning
**Feature Name**: [Salesforce Contact Center with Amazon Connect (Preview)](https://aws.amazon.com/about-aws/whats-new/2024/12/salesforce-contact-center-amazon-connect-preview/) ✔️
:::
Announced Dec 1, 2024
Introduction: Today, AWS announces the Preview of Salesforce Contact Center with Amazon Connect, a groundbreaking offering that integrates native digital and voice capabilities into Salesforce Service Cloud, delivering a unified and streamlined experience for agents. Salesforce users can now unify and route voice, chat, email, and case management across Amazon Connect and Service Cloud capabilities, streamlining operational efficiency and enhancing customer service interactions.
:::success
**Feature Name***: [Amazon Connect sensitive data collection in chats](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-connect-collect-sensitive-customer-data-chats/)
:::
Announced December 1, 2024
Introduction: Amazon Connect has introduced a new feature that enables easier collection of sensitive customer data and delivery of seamless transactional experiences within chats, enhancing the overall customer experience. This new capability allows businesses to support inline chat interactions such as processing payments, updating customer information (e.g., address changes), or collecting customer data (e.g., account details) without requiring customers to switch channels or navigate to another page on the website.
:::success
**Feature Name**: [Amazon Connect self-assign tasks feature](https://aws.amazon.com/about-aws/whats-new/2024/11/aws-billing-cost-management-data-exports-focus-1-0/)
:::
Announced November 25, 2024
Introduction: Amazon Connect has introduced a new feature that enables agents to create and assign tasks to themselves through a simple checkbox in the agent workspace or contact control panel (CCP). This functionality allows agents to schedule follow-up actions, such as customer updates, by scheduling tasks for preferred times and selecting the self-assignment option. The feature is designed to enhance agent productivity and ensure swift resolution of customer issues through Amazon Connect Tasks' capabilities for prioritizing, assigning, and tracking contact center agent tasks.
:::success
**Feature Name**: [Amazon Connect Email](https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-connect-email-generally-available/)
:::
Announced Nov 22, 2024
Introduction Paragraph: Amazon Connect Email provides built-in capabilities that make it easy for you to prioritize, assign, and automate the resolution of customer service emails, improving customer satisfaction and agent productivity. With Amazon Connect Email, you can receive and respond to emails sent by customers to business addresses or submitted via web forms on your website or mobile app. You can configure auto-responses, prioritize emails, create or update cases, and route emails to the best available agent when agent assistance is required. Additionally, these capabilities work seamlessly with Amazon Connect outbound campaigns enabling you to deliver proactive and personalized email communications.
:::warning
**Feature Name**: [Amazon Connect Language Support Expansion](https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-connect-additional-languages-forecasting-capacity-planning-scheduling/) ✔️
:::
Announced November 20, 2024
Introduction: Amazon Connect has expanded its language support by adding nine additional languages for forecasting, capacity planning, and scheduling functionalities. The newly supported languages include Canadian French, Chinese (both Simplified and Traditional),
:::warning
**Feature Name**: [Amazon Connect WhatsApp Business messaging](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-connect-whatsapp-business-messaging/) ✔️
:::
Announced December 1, 2024
Introduction: Amazon Connect now supports WhatsApp Business messaging, enabling you to deliver personalized experiences to your customers who use WhatsApp, one of the world's most popular messaging platforms, increasing customer satisfaction and reducing costs. Rich messaging features such as inline images and videos, list messages, and quick replies allow your customers to browse product recommendations, check order status, or schedule appointments.
:::success
**Feature Name**: [AI guardrails for Amazon Q in Connect](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-connect-ai-guardrails-q-connect/)
:::
Announced December 1, 2024
Introduction: Amazon Q in Connect, a generative AI powered assistant for customer service, has introduced native configuration capabilities for AI guardrails to implement safeguards based on specific use cases and responsible AI policies. This new feature allows contact center administrators to configure company-specific guardrails that can filter harmful and inappropriate responses, redact sensitive personal information, and limit incorrect information in responses that might result from large language model (LLM) hallucination.
:::success
**Feature Name**: [Amazon Connect intraday forecast dashboards](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-connect-launches-intraday-forecast-dashboards/)
:::
Announced Dec 1, 2024
Introduction: Amazon Connect has introduced new intraday forecast dashboards that enable users to compare intraday forecasts against previously published forecasts, review projected daily performance, and receive predictions for effective staffing through Amazon Connect Contact Lens dashboards. The system provides updates every 15 minutes with predictions for rest-of-day contact volumes, average queue answer time, average handle time, and effective staffing, allowing contact center managers to track agent utilization at the queue level and address potential staffing issues before they impact wait times.
:::success
**Feature Name**: [Amazon Connect Contact Lens built-in dashboards for conversational AI bot performance analysis](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-connect-contact-lens-built-in-dashboards-analyze-conversational-ai-bot-performance/)
:::
Announced December 1, 2024
Introduction: Amazon Connect Contact Lens has introduced new built-in dashboards designed to monitor and analyze the performance of conversational AI bots. This feature enables users to analyze and enhance their self-service and automated experiences by providing insights into customer communications, common contact reasons, and interaction outcomes. The dashboards specifically focus on Amazon Lex and Q in Connect bot analytics, allowing users to make quick updates to improve bot accuracy directly from the bot management page.
:::success
**Feature Name**: [Amazon Connect AI assistant for customer segments and trigger-based campaigns](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-connect-ai-assistant-customer-segments-trigger-based-campaigns/)
:::
Announced Dec 1, 2024
Introduction: Amazon Connect now offers new capabilities to proactively engage customers in a personalized manner. These features help non-technical business users create customer segments using prompts and drive trigger-based campaigns to deliver timely, relevant communications to the right audiences. The new segment AI assistant in Amazon Connect Customer Profiles allows building audiences using natural language queries and receiving recommendations based on trends in the customer data, while trigger-based campaigns enable proactive outbound communications based on real-time customer events.
:::success
**Feature Name**: [Amazon Connect simplified conversational AI bot creation](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-connect-simplified-conversational-ai-bot-creation/)
:::
Announced December 1, 2024 ⭐⭐⭐
Introduction: Amazon Connect has introduced a new feature that simplifies the creation, editing, and continuous improvement of conversational AI bots for interactive voice response (IVR) and chatbot self-service experiences. Users can now configure and design their bots (powered by Amazon Lex) directly from the Connect web UI, enabling them to deliver dynamic, conversational AI experiences that can understand customer intent, ask follow-up questions, and automate issue resolution.
:::success
**Feature Name**: [Amazon Connect Audio Recording for IVR and Automated Interactions](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-connect-record-audio-ivr-automated-interactions/)
:::
Announced Dec 1, 2024
Introduction: Amazon Connect has introduced a new capability that enables audio recording during customer interactions with self-service interactive voice response (IVR) and other automated interactions. This feature allows users to listen to recordings and review logs on the Contact details page, which includes information such as bot transcription or touch-tone menu selection. The recording settings can be configured through the "Set recording and analytics behavior" block in Amazon Connect's drag-and-drop workflow designer, allowing users to easily specify which portions of the experience to record, including the ability to pause and resume recordings during sensitive exchanges like when customers share credit card or social security numbers.
:::success
**Feature Name**: [Amazon Connect Contact Lens - Automatic Contact Categorization with Generative AI](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-connect-contact-lens-categorizes-contacts-generative-ai/)
:::
Announced Dec 1, 2024
Introduction: Amazon Connect Contact Lens has introduced a new capability that automatically categorizes contacts using generative AI, enabling easy identification of top drivers, customer experience, and agent behavior for contacts. Users can provide categorization criteria in natural language (e.g., asking if a customer attempted to make a payment on their balance), and Contact Lens will automatically label contacts that match the criteria while providing relevant conversation points. The feature also includes the ability to receive alerts, generate tasks on categorized contacts, and search for contacts using automated labels, helping supervisors easily categorize contacts for various scenarios such as identifying customer interest in specific products, assessing customer satisfaction, and monitoring agent professional behavior.
:::success
**Feature Name**: [Amazon Connect Granular Disconnect Reasons for Chats](https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-connect-granular-disconnect-reasons-chats/)
:::
Announced Nov 22, 2024
Introduction: Amazon Connect has enhanced its contact record system by introducing granular disconnect reasons for chats, a feature that enables businesses to improve and personalize customer experiences based on how a chat interaction ends. This new capability allows for specific actions to be taken based on the disconnect reason - for example, routing a chat to the next best agent if an agent disconnects due to network issues, or proactively sending an SMS to re-engage customers who disconnect due to idleness.
:::success
**Feature Name**: [Amazon Connect External Voice Transfers](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-connect-external-voice-transfers/)
:::
Announced Dec 1, 2024
Introduction: Amazon Connect has introduced external voice transfer capability that enables integration with other voice systems to directly transfer voice calls and metadata without using the public telephone network. This new feature allows customers to use Amazon Connect telephony and Interactive Voice Response (IVR) with their existing voice systems to improve customer experience and reduce costs. The service provides IVR with conversational voice bots in over 30 languages, featuring natural language processing, automated speech recognition, and text-to-speech capabilities to help personalize customer service, enable complex self-service tasks, and reduce agent handling time. This allows enterprises to modernize their IVR experience while maintaining existing contact center systems, and those migrating to Amazon Connect can begin with Connect telephony and IVR for immediate modernization before agent migration.
Reference:
- [Amazon Connect Contact Lens now supports external voice](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-connect-contact-lens-external-voice/)
:::success
**Feature Name**: [Amazon Connect Contact Lens with generative AI for agent performance evaluations](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-connect-contact-lens-agent-performance-evaluations-generative-ai/)
:::
Announced Dec 1, 2024
Introduction: Amazon Connect Contact Lens now provides you with the ability to use generative AI to automatically fill and submit agent performance evaluations. Managers can now specify their evaluation criteria in natural language, and use generative AI for automating evaluations of any or all of agents' customer interactions, and get aggregated agent performance insights across cohorts of agents over time.
:::success
**Feature Name**: [Amazon Connect Contact Lens Calibrations for Agent Performance Evaluations](https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-connect-contact-lens-calibrations-agent-evaluations/)
:::
Announced November 25, 2024
Introduction: Amazon Connect Contact Lens has introduced a new calibration feature designed to enhance consistency and accuracy in agent performance evaluations. This feature allows multiple managers to evaluate the same contact using identical evaluation forms, enabling organizations to review differences in evaluations across managers, align on evaluation best practices, and identify opportunities to improve evaluation forms. The feature also includes the ability to compare manager's answers with approved evaluations to measure and improve manager accuracy in evaluating agent performance
## Amazon Q Business
:::success
**Feature Name**: [Amazon Q Business Integration with Amazon QuickSight (Preview)](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-q-business-insights-databases-data-warehouses-preview/)
:::
Announced December 3, 2024
Introduction: Today, AWS announces the public preview of the integration between Amazon Q Business and Amazon QuickSight, delivering a transformative capability that unifies answers from structured data sources (databases, warehouses) and unstructured data (documents, wikis, emails) in a single application. Amazon Q Business is a generative AI–powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems, while Amazon QuickSight is a business intelligence (BI) tool that helps visualize and understand structured data through interactive dashboards, reports, and analytics.
:::success
**Feature Name**: [Amazon Q Business Plugin Actions](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-q-business-50-actions-business-applications-platforms/)
:::
Announced December 3, 2024
Amazon Q Business has expanded its capabilities by introducing a ready-to-use library of over 50 actions through plugins that integrate with popular business applications and platforms, including PagerDuty, Salesforce, Jira, Smartsheet, and ServiceNow. This enhancement enables users to perform various tasks such as creating and updating tickets, managing incidents, and accessing project information directly within the Amazon Q Business interface, without having to switch between different applications, thereby improving user experience and operational efficiency.
:::success
**Feature Name**: [Ability to reuse recently uploaded files in Amazon Q Business conversations](https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-q-business-reuse-uploaded-files-conversation/)
:::
Announced November 22, 2024
Introduction: Amazon Q Business has introduced a new feature that allows users to reuse recently uploaded files in new conversations without the need to upload them again. This enhancement includes a drag-and-drop functionality for file uploads and maintains a list of recently used documents that is private to each individual user. Users can manage their cached list by deleting conversations containing the uploaded files
:::info
**Feature Name**: Amazon Q Business browser extension
:::
Announced November 22, 2024
Introduction: Amazon Web Services has announced the general availability of Amazon Q Business browser extensions for Google Chrome, Mozilla Firefox, and Microsoft Edge. This new extension enables users to access context-aware, generative AI assistance directly within their browsers, allowing them to summarize web pages, ask questions about web content or uploaded files, and leverage large language model knowledge for their daily tasks.
:::success
**Feature Name**: [Amazon Q in Connect for self-service interactions](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-connect-generative-ai-powered-self-service-amazon-q-connect/)
:::
Announced Dec 1, 2024
Introduction: Amazon Q in Connect, a generative-AI powered assistant for customer service, now supports end-customer self-service interactions across Interactive Voice Response (IVR) and digital channels. With this launch, businesses can augment their existing self-service experiences with generative AI capabilities to create more personalized and dynamic experiences to improve customer satisfaction and first contact resolution.
:::success
**Feature Name**: Tabular search in Amazon Q Business
:::
Announced Nov 22, 2024
Introduction Paragraph: Amazon Q Business is a generative AI–powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. A large portion of that information is found in text narratives stored in various document formats such as PDFs, Word files, and HTML pages. Some information is also stored in tables (e.g. price or product specification tables) embedded in those same document types, CSVs, or spreadsheets. Although Amazon Q Business can provide accurate answers from narrative text, getting answers from these tables requires special handling of more structured information.
:::success
**Feature Name**: [Amazon Q Business Analytics Dashboard and CloudWatch Logs Integratio](https://aws.amazon.com/about-aws/whats-new/2024/10/amazon-q-business-analytics-conversation-insights/)
:::
Announced October 22, 2024
Introduction: Amazon Q Business has introduced a new analytics dashboard and integration with Amazon CloudWatch Logs, providing comprehensive insights into the usage of Amazon Q Business application environments and Amazon Q Apps. The analytics dashboard in the Amazon Q Business console delivers insights through interactive charts and visualizations, allowing administrators to monitor key metrics such as usage trends, user conversations, query trends, and user feedback, while the CloudWatch Logs integration enables access to chat conversation and feedback data through Amazon CloudWatch Logs, Amazon S3, and Amazon Data Firehouse.
:::success
**Feature Name**: [Amazon Q Business visual analysis feature](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-q-business-extract-insights-visual-elements-documents/)
:::
Announced Dec 1, 2024
Introduction: Amazon Q Business is a fully managed, generative AI–powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. Amazon Q Business now offers capabilities to answer questions and extract insights from visual elements embedded within documents.
:::success
**Feature Name**: [Amazon Q Business - Simplified Setup and New Web App Experience](https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-q-business-simplified-setup-web-app-experience/)
:::
Announced November 4, 2024
Introduction: Amazon Q Business is a fully managed, generative AI–powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. The new update introduces a simplified onboarding process for administrators to quickly deliver a secure AI assistant, along with a web app experience that enables end users to immediately start using generative AI for their work.
:::success
**Feature Name**: [The Amazon Q index](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-q-index-software-vendors-ai-experiences/)
:::
Timestamp: December 1, 2024 ⭐⭐⭐
Introduction: Independent software vendors (ISVs) like Asana, Miro, PagerDuty, Zoom, and more are integrating the Amazon Q index into their applications to enrich their generative AI experiences with enterprise knowledge and user context spanning multiple applications. End customers remain in control of which applications can access their data, and the index retains user-level permissions.
:::success
**Feature Name**: [Amazon Q Apps Data Collection (Preview)](https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-q-apps-data-collection-preview/)
:::
Announced November 27, 2024
Introduction: Amazon Q Apps, the generative AI-powered app creation capability of Amazon Q Business, now offers a new data collection feature in public preview. This enhancement enables users to collate data across multiple users within their organization, further enhancing the collaborative quality of Amazon Q Apps for various business needs.
## Amazon SageMaker
:::success
**Feature Name**: [Next generation of Amazon SageMaker](https://aws.amazon.com/about-aws/whats-new/2024/12/next-generation-amazon-sagemaker/)
:::
Announced December 3, 2024 ⭐⭐⭐⭐⭐
Introduction: AWS announces the next generation of Amazon SageMaker, a unified platform for data, analytics, and AI. This launch brings together widely adopted AWS machine learning and analytics capabilities and provides an integrated experience for analytics and AI with unified access to data and built-in governance. Teams can collaborate and build faster from a single development environment using familiar AWS tools for model development, generative AI application development, data processing, and SQL analytics (powered by Redshift), accelerated by Amazon Q Developer, the most capable generative AI assistant for software development.
:::success
**Feature Name**e: [Amazon SageMaker Unified Studio](https://aws.amazon.com/about-aws/whats-new/2024/12/preview-amazon-sagemaker-unified-studio/)
:::
Announced December 3, 2024 ⭐⭐⭐⭐⭐
Introduction: AWS announced the preview launch of Amazon SageMaker Unified Studio, an integrated data and AI development environment that enables collaboration and helps teams build data products faster. SageMaker Unified Studio brings together familiar tools from AWS analytics and AI/ML services for data processing, SQL analytics, machine learning model development, and generative AI application development. The platform includes Amazon SageMaker Lakehouse, which provides open source compatibility and access to data stored across Amazon S3 data lakes, Amazon Redshift data warehouses, and third-party and federated data sources, with enhanced governance features built in to help meet enterprise security requirements.
:::success
**Feature Name**: [Amazon Bedrock IDE (Preview)](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-bedrock-ide-preview-sagemaker-unified-studio/)
:::
Announced December 3, 2024 ⭐⭐⭐
Introduction: Today AWS announces the preview launch of Amazon Bedrock IDE, a governed collaborative environment integrated within Amazon SageMaker Unified Studio (preview) that enables developers to swiftly build and tailor generative AI applications. It provides an intuitive interface for developers across various skill levels to access Amazon Bedrock's high-performing foundation models (FMs) and advanced customization capabilities in order to collaboratively build custom generative AI applications.
Reference
- [Amazon Bedrock IDE in Amazon SageMaker Unified Studio](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/amazon-bedrock-ide.html)
:::success
**Feature Name**: [Amazon SageMaker Data and AI Governance](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-sagemaker-data-ai-governance/)
:::
Announced December 3, 2024
Introduction: AWS announces Amazon SageMaker Data and AI Governance, a new capability that simplifies discovery, governance, and collaboration for data and AI across your lakehouse, AI models, and applications. Built on Amazon DataZone, SageMaker Data and AI Governance allows engineers, data scientists, and analysts to securely discover and access approved data and models using semantic search with generative AI–created metadata. This new offering helps organizations consistently define and enforce access policies using a single permission model with fine-grained access controls.
:::success
**Feature Name**: [Data Lineage in Amazon DataZone and next generation of Amazon SageMaker](https://aws.amazon.com/about-aws/whats-new/2024/12/data-lineage-amazon-datazone-next-generation-sagemaker/)
:::
Announced December 3, 2024
Introduction: AWS announces general availability of Data Lineage in Amazon DataZone and next generation of Amazon SageMaker, a capability that automatically captures lineage from AWS Glue and Amazon Redshift to visualize lineage events from source to consumption. Being OpenLineage compatible, this feature allows data producers to augment the automated lineage with lineage events captured from OpenLineage-enabled systems or through API, to provide a comprehensive data movement view to data consumers.
:::success
**Feature Name**: [Zero-ETL Integrations for SageMaker Lakehouse and Redshift](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-sagemaker-lakehouse-redshift-zero-etl-integrations-eight-applications/)
:::
Announced December 3, 2024
Amazon SageMaker Lakehouse and Amazon Redshift have introduced zero-ETL integrations support for eight applications, including Salesforce, SAP, ServiceNow, and Zendesk, enabling automated extraction and loading of data from these applications. This fully managed AWS integration eliminates the need to build ETL data pipelines, reducing operational burden and weeks of engineering effort typically required for designing, building, and testing data pipelines, while allowing users to quickly set up data ingestion through a no-code interface to maintain up-to-date data replicas in their data lake and data warehouse for enhanced analysis and machine learning purposes.
:::success
**Feature Name**: [Amazon SageMaker Model Registry Custom ML Model Lifecycle Stages](https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-sagemaker-model-registry-defining-machine-learning-lifecycle-stages/)
:::
Announced Nov 12, 2024
Introduction: Amazon SageMaker Model Registry has introduced support for custom machine learning (ML) model lifecycle stages, enhancing model governance capabilities. This new feature enables data scientists and ML engineers to define and control their models' progression across various stages, from development to production. Users can now define custom stages such as development, testing, and production, and track stage approval status including Pending Approval, Approved, and Rejected. This functionality helps organizations standardize their model governance practices, maintain better oversight of model progression, and ensure only approved models reach production environments.
:::success
**Feature Name**: [SageMaker Model Registry Model Lineage Support](https://aws.amazon.com/about-aws/whats-new/2024/11/sagemaker-model-registry-model-lineage-governance/)
:::
Announced Nov 13, 2024 ⭐⭐⭐
Introduction: Amazon SageMaker Model Registry has introduced support for tracking machine learning (ML) model lineage, which automatically captures and maintains information about ML workflow steps, from data preparation and training to model registration and deployment. This new capability allows data scientists and ML engineers to easily track and view model lineage details such as datasets, training jobs, and deployment endpoints in Model Registry, creating an audit trail for traceability and reproducibility to enhance model governance.
:::info
**Feature Name**: [Amazon SageMaker Multi-Adapter Model Inference](https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-sagemaker-multi-adapter-model-inference/)
:::
Announced November 25, 2024 ⭐⭐⭐
小結:Multi-Adapter 僅支援 inference component。
Introduction: Amazon SageMaker has introduced new multi-adapter inference capabilities that enable customers to deploy hundreds of fine-tuned LoRA (Low-Rank Adaptation) model adapters behind a single endpoint. This feature allows dynamic loading of appropriate adapters in milliseconds based on the request, enabling efficient hosting of many specialized LoRA adapters built on a common base model. The solution delivers high throughput and cost savings compared to deploying separate models
:::spoiler 重點說明
Adapter 是如何在 inference components 中組織的:

GitHub: [AIM313 Workshop Code for Reinvent 2024 ](https://github.com/aws-samples/sagemaker-genai-hosting-examples/tree/main/genai-recipes/Multi-LoRA-Adapters/Reinvent-AIM313-Workshop-Adapter-Inference)
第一個例子:
創建一個單一的基礎推理元件,其中包含:
基礎的 Llama 2 模型
在同一個元件中的多個 adapters(西班牙語、俄語、法語)
Adapters 作為同一個推理元件的產物被載入
Adapters 是通過 API 調用參數而不是不同的推理元件來切換
第二個例子:
創建獨立的推理元件用於:
一個基礎模型元件
個別的 adapter 元件,它們引用基礎元件
每個 adapter 都有自己專屬的推理元件
Adapters 以父子關係組織,其中 adapter 元件依賴於基礎模型元件
Adapters 通過調用不同的推理元件名稱來切換
主要區別在於:
第一個例子:Adapters 打包在單一推理元件內
第二個例子:每個 adapter 都是獨立的推理元件,引用一個基礎模型元件
第二種方法在 adapter 管理上提供更多靈活性,因為每個 adapter 都可以獨立版本化和管理,但在元件管理方面需要更多開銷。
這可以從 API 調用中看出:
第一個例子:
```python
# 在同一元件中切換 adapters
response = smr_client.invoke_endpoint(
InferenceComponentName=component_to_invoke,
Body=json.dumps({
"inputs": prompt,
"adapters": ["es"] # 在請求中指定 adapter
})
)
```
第二個例子:
```python
# 在不同的 adapter 元件間切換
response = smr_client.invoke_endpoint(
InferenceComponentName=ic1_adapter_name, # 每個 adapter 都有不同的元件
Body=json.dumps({
"inputs": prompt
})
)
```
:::
Reference:
- AWS blog: [SageMaker Inference Component](https://aws.amazon.com/blogs/machine-learning/reduce-model-deployment-costs-by-50-on-average-using-sagemakers-latest-features/)
- GitHub: [S-LoRA: Serving Thousands of Concurrent LoRA Adapters](https://github.com/S-LoRA/S-LoRA)
- AWS blog: [Easily deploy and manage hundreds of LoRA adapters with SageMaker efficient multi-adapter inference](https://aws.amazon.com/blogs/machine-learning/easily-deploy-and-manage-hundreds-of-lora-adapters-with-sagemaker-efficient-multi-adapter-inference/)
- AWS samples: [Multi-Adapter Hosting on SageMaker Real-Time Inference](https://github.com/aws-samples/sagemaker-genai-hosting-examples/tree/main/genai-recipes/Multi-LoRA-Adapters)
:::success
**Feature Name**: [Container Caching and Fast Model Loader](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-sagemaker-accelerate-scaling-generative-ai-inference/)
:::
Announced December 6, 2024 ⭐⭐⭐
小結:Container caching 僅支援 inference component。Container Caching is automatically enabled for popular SageMaker DLCs like LMI, Hugging Face TGI, NVIDIA Triton, and PyTorch used for inference。LMI v13 以後的版本針對 GPU 的機型支援 Fast Model Loader。支援所有與 vLLM 相容的 FP16 模型。
Introduction: Amazon SageMaker has introduced two new capabilities in SageMaker Inference to enhance the deployment and scaling of generative AI models: Container Caching and Fast Model Loader. These innovations address critical challenges in scaling large language models (LLMs) efficiently, enabling faster response times to traffic spikes and more cost-effective scaling. By reducing model loading times and accelerating autoscaling, these features allow customers to improve the responsiveness of their generative AI applications as demand fluctuates, particularly benefiting services with dynamic traffic patterns.
:::spoiler 自動擴展時實現了 20% 的延遲減少
**Weight streaming**

**Pre-sharding**
將模型權重存儲在統一的 8MB 塊中



**Out-of-order processing**


:::
:::spoiler Fast Model Loader 程式碼
```python
# Create a model builder object
model_builder = ModelBuilder(
model="meta-textgeneration-llama-3-1-70b",[{{type}} Annotation]
role_arn=role,
sagemaker_session=sess,
schema_builder=SchemaBuilder(sample_input="Test", sample_output="Test")
)
# Run model optimization job
model_builder.optimize(
instance_type="ml.p4d.24xlarge",
output_path=output_path,
sharding_config={
"OverrideEnvironment":{
"OPTION_TENSOR_PARALLEL_DEGREE": "8"
}
}
)
# Use the build() function to generate the artifacts according to the model server
final_model = model_builder.build()
# You only need to set the values if you are using existing sharded models
if not final_model._is_sharded_model:
final_model._is_sharded_model = True
if final_model._enable_network_isolation:
final_model._enable_network_isolation = False
```
:::
Reference:
- AWS blog: [Introducing Fast Model Loader in SageMaker Inference: Accelerate autoscaling for your Large Language Models (LLMs) – part 1](https://aws.amazon.com/blogs/machine-learning/introducing-fast-model-loader-in-sagemaker-inference-accelerate-autoscaling-for-your-large-language-models-llms-part-1/)
- AWS blog: [Introducing Fast Model Loader in SageMaker Inference: Accelerate autoscaling for your Large Language Models (LLMs) – Part 2](https://aws.amazon.com/blogs/machine-learning/introducing-fast-model-loader-in-sagemaker-inference-accelerate-autoscaling-for-your-large-language-models-llms-part-2/)
- [Direct Memory Access](https://www.spiceworks.com/tech/hardware/articles/direct-memory-access/)
- AWS samples: [Accelerating LLM Deployments with SageMaker Fast Model Loader](https://github.com/aws-samples/sagemaker-genai-hosting-examples/blob/main/Llama3.1/Llama3.1-70B-SageMaker-Fast-Model-Loader.ipynb)
---
- AWS blog: [Supercharge your auto scaling for generative AI inference – Introducing Container Caching in SageMaker Inference](https://aws.amazon.com/blogs/machine-learning/supercharge-your-auto-scaling-for-generative-ai-inference-introducing-container-caching-in-sagemaker-inference/)
:::info
**Feature Name**: [scale SageMaker inference endpoints to zero instances]()
:::
Announced December 2, 2024 ⭐⭐⭐
小結:Scale-to-zero 僅支援 inference component。
The scale down to zero feature allows SageMaker inference endpoints to scale down to zero instances during periods of inactivity, which is a significant change from the previous requirement of maintaining a minimum number of instances for continuous availability. This capability helps customers optimize costs while maintaining the flexibility to scale back up when needed.
:::spoiler Scale to Zero 重點整理
**SageMaker Inference Component:**


**Scale to Zero 功能在以下三種主要情境特別適用:**
可預測的流量模式 (Predictable traffic patterns)
當推理流量是可預測且遵循一致的時間表時
可以自動在低使用或無使用期間縮減至零
避免需要手動刪除和重新建立推理組件和端點
零星或變動的流量 (Sporadic or variable traffic)
適用於經歷零星或變動推理流量模式的應用程序
可以在無流量時期節省成本
但需注意從零擴展回來時會有延遲,期間的請求會失敗
開發和測試環境 (Development and testing environments)
適用於測試和評估新ML模型時
在模型開發和實驗過程中創建臨時推理端點
可以防止忘記刪除測試端點而產生不必要的費用
讓測試端點在不使用時自動縮減至零實例
需要注意的是,從零擴展回來會有冷啟動延遲,因此在實施此功能時需要評估應用程序是否能容忍這種延遲。
:::
:::spoiler Autoscaling 相關補充
**實現 faster auto scaling 的核心技術包含以下幾個方面:**
新的高解析度指標:
ConcurrentRequestsPerModel 和 ConcurrentRequestsPerCopy
這些指標每10秒發出一次(sub-minute metrics)
直接追踪系統負載,包括容器內正在處理的並發請求數量
改進的請求追踪機制:
對於流式響應,可以追踪到最後一個 token 傳輸完成
同時考慮正在處理和排隊的請求
避免熱點問題,通過將客戶端請求導向到較不繁忙的 GPU
自動擴展策略:
支持目標追踪(Target Tracking)和步進擴展(Step Scaling)策略
可以結合使用基於並發的和基於調用的擴展策略
提供冷卻期間設置來防止過度擴展
Container Caching:
預緩存容器映像
消除在擴展時下載容器的需求
自動啟用於主要的深度學習容器
**Autoscaling target tracking 和 step scaling 的比較:**
Target Tracking:
* 通過增加容量來減少指標值(ConcurrentRequestsPerModel/Copy)與目標值之間的差異來實現擴展
* 當指標低於目標值時,會通過移除容量來實現縮減
Step Scaling:
* 使用一組稱為 step adjustments 的調整來擴展容量
* 調整的大小根據指標值(ConcurrentRequestsPerModel/Copy)/警報違規的程度而變化
共同點:
* 兩者都是 Application Auto Scaling 支持的擴展策略
* 都可以處理擴展(scale-out)和縮減(scale-in)操作
* 都使用相同的指標(如 ConcurrentRequestsPerModel/Copy)來進行擴展決策
建議的應用場景:
Target Tracking 建議用於:
CPU 使用率追蹤
請求數量平均值追蹤
網路流量追蹤
Step Scaling 建議用於:
需要特定條件下的快速反應
複雜的擴展邏輯
基於多個條件的擴展決策
```
aas_client.put_scaling_policy(
PolicyName=target_tracking_policy_name,
PolicyType="TargetTrackingScaling",
ServiceNamespace=service_namespace,
ResourceId=resource_id,
ScalableDimension=scalable_dimension,
TargetTrackingScalingPolicyConfiguration={
"PredefinedMetricSpecification": {
"PredefinedMetricType": "SageMakerInferenceComponentConcurrentRequestsPerCopyHighResolution", # 10 secs
},
# Low TPS + load TPS
"TargetValue": 1, # per model
"ScaleInCooldown": 150, # secs, default 300
"ScaleOutCooldown":150, # secs, default 300
},
)
cloudwatch_client.put_metric_alarm(
AlarmName=step_scaling_alarm_name,
AlarmActions=[step_scaling_policy_arn], # Replace with your actual ARN
MetricName='NoCapacityInvocationFailures',
Namespace='AWS/SageMaker',
Statistic='Maximum',
Dimensions=[
{
'Name': 'InferenceComponentName',
'Value': inference_component_name # Replace with actual InferenceComponentName
}
],
Period=30, # secs, 約莫>=30秒後,若監測的條件符合擴容,會開始動作
EvaluationPeriods=1, # Period (多久時間) x EvaluationPeriods (幾個輪迴)
DatapointsToAlarm=1, # 幾次條件滿足觸發
Threshold=1, # 統計值的閾值
ComparisonOperator='GreaterThanOrEqualToThreshold',
TreatMissingData='missing'
)
```
:::
Reference:
- AWS blog: [Reduce model deployment costs by 50% on average using the latest features of Amazon SageMaker](https://aws.amazon.com/blogs/machine-learning/reduce-model-deployment-costs-by-50-on-average-using-sagemakers-latest-features/)
- AWS blog: [Amazon SageMaker inference launches faster auto scaling for generative AI models](https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-inference-launches-faster-auto-scaling-for-generative-ai-models/)
- AWS blog: [Unlock cost savings with the new scale down to zero feature in SageMaker Inference](https://aws.amazon.com/blogs/machine-learning/unlock-cost-savings-with-the-new-scale-down-to-zero-feature-in-amazon-sagemaker-inference/)
- AWS samples: [Scale to zero endpoint](https://github.com/aws-samples/sagemaker-genai-hosting-examples/tree/main/scale-to-zero-endpoint/)
- Official document: [Application Auto Scaling](https://docs.aws.amazon.com/autoscaling/application/userguide/what-is-application-auto-scaling.html)
:::success
**Feature Name**: [Amazon SageMaker Partner AI Apps](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-sagemaker-partner-ai-apps/)
:::
Announced December 4, 2024 ⭐⭐⭐⭐⭐
Introduction: AWS announced the general availability of Amazon SageMaker partner AI apps, a new capability that enables customers to easily discover, deploy, and use best-in-class machine learning (ML) and generative AI (GenAI) development applications from leading app providers privately and securely within Amazon SageMaker AI. This solution allows customers to boost their team's productivity and reduce time to market by integrating specialized applications like Comet, Deepchecks, Fiddler, and Lakera, while maintaining data security and providing a seamless development experience without leaving the SageMaker environment.
:::success
**Feature Name**: [Amazon Q Developer in Amazon SageMaker Canvas](https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-q-developer-guide-sagemaker-canvas-users-ml-development/)
:::
Announced December 4, 2024
Introduction: Amazon Q Developer is now available in preview within Amazon SageMaker Canvas, enabling users to build ML models using natural language. This generative AI-powered assistance guides users through the entire ML lifecycle, from data preparation to model deployment. The feature helps users of all skill levels to build high-quality ML models using natural language guidance, which accelerates innovation and time to market.
Reference:
- AWS blog: [Use Amazon Q Developer to build ML models in Amazon SageMaker Canvas](https://aws.amazon.com/blogs/aws/use-amazon-q-developer-to-build-ml-models-in-amazon-sagemaker-canvas/)
- [AWS AI Service Cards](https://aws.amazon.com/ai/responsible-ai/resources/)
:::success
**Feature Name**: [Amazon SageMaker Notebook Instances Support for Trainium1 and Inferentia2](https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-sagemaker-notebook-instances-trainium1-inferentia-2-based-instances/)
:::
Announced November 15, 2024
:::success
**Feature Name**: [JupyterLab 4 Support for Amazon SageMaker Notebook Instances](https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-sagemaker-notebook-instances-jupyterlab-4-notebooks/)
:::
Announced November 1, 2024
:::success
**Feature Name**: [sagemaker-core SDK](https://aws.amazon.com/about-aws/whats-new/2024/09/sagemaker-core-object-oriented-sdk-amazon-sagemaker/)
:::
Announced September 3, 2024 ⭐⭐⭐
Introduction: Amazon SageMaker has announced sagemaker-core, a new Python SDK that provides an object-oriented interface for interacting with SageMaker resources such as TrainingJob, Model, and Endpoint resource classes. This new SDK features resource chaining that allows developers to pass resource objects as parameters, eliminating manual complex parameter specification, while abstracting low-level details like resource state transitions and polling logic. The SDK achieves full parity with SageMaker APIs and includes key usability improvements such as auto code completion in popular IDEs, comprehensive documentation, and type hints.
Reference:
- AWS blog: [Introducing SageMaker Core: A new object-oriented Python SDK for Amazon SageMaker](https://aws.amazon.com/blogs/machine-learning/introducing-sagemaker-core-a-new-object-oriented-python-sdk-for-amazon-sagemaker/)