Memory Management: Redis vs. ChatHistoryAgentThread

# Memory Management: Redis vs. ChatHistoryAgentThread In the early stages of development, Redis was used to store conversation history. The purpose was to maintain context, enabling the AI assistant to provide consistent responses across multi-turn conversations. However, as system requirements grew, we discovered structural limitations of Redis that made it inadequate for increasingly complex task reasoning and toolchain integration. This report covers the following key topics: * **Problem Analysis** – A deep dive into Redis’s structural limitations, including Plugin Tools context loss and lack of personalized context. * **Solution** – A detailed introduction to the design philosophy of ChatHistoryAgentThread, including structured content storage and thread-based memory management. * **Implementation** – A complete ThreadManager implementation example showing dynamic user thread management and message flow integration. * **Technical Advantages** – Explanation of improvements in tool call tracking, personalization, memory optimization, toolchain management, parameter collection, and developer-friendliness. Through this architecture shift, the system not only “remembers what it did,” but also provides each user with a personalized, efficient, and intelligent conversational experience. --- ## Problem Analysis: Limitations of Redis Redis stores only the **plain text messages** exchanged between user and AI assistant. While sufficient for basic chat, it falls short in frameworks like Semantic Kernel that rely on Plugin function calling. The limitations include: * **Cannot store AI reasoning steps and intermediate decisions (Plugin Tools)** * **Cannot provide personalized context for different users** ### 1. Plugin Tools Context Loss Redis cannot store: * `FunctionCallContent` – tool call details * `FunctionResultContent` – tool execution results * `AssistantMessageContent` – AI reasoning steps * Cross-turn parameter collection state Example: user requests sales trend of “Rice Crackers” but omits the time range: ```python # Redis only stores plain messages: user_message: "Query sales trend of Rice Crackers" ai_response: "Please provide the time range" user_message: "May" ``` * With Redis, “May” is treated as an independent message. The model must re-interpret context. * The system cannot track: which tool required this field, whether other parameters are collected, or whether queries are duplicated. ### 2. Lack of Personalized Context Redis mode shares the same generic context for all users. It cannot adapt to user identity, role, or permissions for personalized experiences. --- ## Solution: Introducing ChatHistoryAgentThread To solve these issues, we adopt **ChatHistoryAgentThread** from Semantic Kernel. This combines **ChatHistory()** (structured content storage) and **Thread()** (memory management). ### ChatHistory() – Structured Content Storage Each conversation includes more than just plain text: * User input (`UserMessageContent`) * AI responses (`AssistantMessageContent`) * Tool calls (`FunctionCallContent`) * Tool parameters (Function Arguments) * Tool results (`FunctionResultContent`) All these are **automatically stored in conversation history**, enabling the model to reference directly without re-inference. ### Thread() – Memory Management While ChatHistory handles structured memory, Thread manages: * **Conversation lifecycle**: start, pause, resume, end * **Intelligent summarization**: built-in `reduce()` method automatically triggers summarization * **Multi-agent collaboration support**: unified interface for multiple agents * **State tracking**: monitor status (active, waiting, completed) ==**In short: ChatHistory manages “what to remember,” Thread manages “how to remember.”**== ==**ChatHistoryAgentThread combines both into a stable memory system for agents.**== --- ## Implementation ### Core Components ```python from semantic_kernel.contents import ChatHistorySummarizationReducer from semantic_kernel.agents.chat_completion.chat_completion_agent import ChatHistoryAgentThread from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion ``` ### Thread Manager Implementation We implement a `ThreadManager` to dynamically generate and manage threads per user, with summarization: ```python class ThreadManager: def __init__(self, service): self.service = service self.user_threads = {} # Key: user_id, Value: ChatHistoryAgentThread self.system_prompt = "" self.user_prompt_defined = set() # Define personalized Thread per user def define_user_thread(self, user_name: str, user_role: str, user_id: str): if user_id in self.user_prompt_defined: return self.system_prompt = f""" User name: {user_name}, role: {user_role}. Provide assistance or restrict features based on role. - Respond politely, and use the user’s name where appropriate. """ thread = self.user_threads[user_id] thread._chat_history.add_system_message(self.system_prompt) # Retrieve or create user thread def get_thread(self, user_id: str): if user_id not in self.user_threads: summarizer_chat_history = ChatHistorySummarizationReducer( service=self.service, target_count=15, # Around 2-3 messages threshold_count=5, summarization_instructions = """ Instructions for summarizing chat history """ ) history = summarizer_chat_history thread = ChatHistoryAgentThread() thread._chat_history = history self.user_threads[user_id] = thread return self.user_threads[user_id] ``` ### Message Flow Integration ```python async def handle_user_message(event): user_id = event.source.user_id user_message = event.message.text thread = thread_manager.get_thread(user_id) # Authentication & RBAC check if not AuthPlugin.line_id_in_db(user_id): verification_response = await auth_agent.get_response( messages=user_message + f"LINE ID: {user_id}", thread=thread ) verification_response = verification_response.items[0].text line_text_response(event, verification_response) return # Define personalized Thread user_info = AuthPlugin.get_user_info_by_line_id(user_id) user_name = user_info.get("user_name", "Unknown") user_role = user_info.get("role", "Unknown") thread_manager.define_user_thread(user_name, user_role, user_id) # Process message with full context response = await agent.get_response(messages=user_message, thread=thread) line_text_response(event, response.items[0].text) # Summarize history reduced = await thread.reduce() ``` --- ## Technical Advantages: The Model “Remembers What It Did” Key benefits of ChatHistoryAgentThread: * Tool call tracking: trace entire execution flow * Personalization: role-based permissions & customized replies * Memory optimization: smart summarization & token saving * Toolchain management: prevent duplicate queries * Parameter collection: maintain state across turns * Developer-friendly: easier debugging & traceability ### 1. Tool Call Tracking Failed tool calls (e.g., missing parameters) are recorded: ```json { "turn_1": { "user": "Query sales trend of Rice Crackers", "function_call": { "name": "get_product_sales_data", "arguments": { "product_name": "Rice Crackers", "start_day": null, "end_day": null } }, "function_result": "Error: Missing required parameters", "assistant": "Please provide the time range" }, "turn_2": { "user": "May", "context_resolution": "Identified as parameter completion for previous query", "function_call": { "name": "get_product_sales_data", "arguments": { "product_name": "Rice Crackers", "start_day": "2024-05-01", "end_day": "2024-05-31" } } } } ``` If the user says “May,” the model links it as a parameter completion, not a new query. --- ### 2. Personalization: Role Permissions & Customized Responses Using `define_user_thread()`, the system personalizes conversations: * Role recognition: restrict or allow features by role (manager, employee, customer) * Personalized replies: use user’s real name for natural interaction * Permission control: enforce role-based feature restrictions Redis cannot achieve this because it stores only generic text, without system-level user-specific context. --- ### 3. Memory Optimization: Smart Summarization & Token Savings Summarization strategy: * **Short-term memory**: keep recent messages in detail * **Long-term memory compression**: older messages summarized to save tokens * **Traceable logic**: every tool call and output recorded for later reference --- ### 4. Toolchain Management: Prevent Duplicate Queries If the model needs product info first, then sales data, full history ensures: * Check whether intermediate results are already available * Avoid redundant queries * Dynamically decide next step (fallback or continue) ``` # Example: Product Analysis Workflow # Step 1: Get product info # Step 2: Get sales data # Step 3: Generate trend analysis # Example 2: Product Suggestions # Reuse intermediate results to avoid repeating tool calls ``` --- ### 5. Parameter Collection: Cross-Turn State Maintenance Users may provide parameters over multiple turns (e.g., product name, time, store). Full history allows: * Building complete queries * Merging parameters automatically --- ### 6. Developer-Friendly: Debugging & Traceability Retaining `FunctionCallContent` and `FunctionResultContent` benefits developers: * Quickly identify incorrect parameters or logic issues * Visualize conversation flows and behavior records --- ## Comparison: Redis vs ChatHistoryAgentThread | Metric | Redis | ChatHistoryAgentThread | | -------------------- | -------------------- | ----------------------- | | Context retention | Plain text only | Full structured context | | Plugin Tool tracking | No | Yes | | Personalization | No | Yes | | Parameter collection | Re-parsed every turn | Maintained state | | Token usage | High (re-parsed) | Low (context reuse) | | Error rate | High | Low | | Debugging | Difficult | Full traceability |