Threads API - HackMD

# Threads API The Threads API provides a stateful conversation experience with persistent message history and advanced RAG-powered run management. ## Overview The Threads API enables persistent, context-aware conversations with powerful knowledge retrieval: 1. **Create Threads** - Start new conversations with optional initial messages 2. **Manage Messages** - Add user/assistant messages with rich metadata 3. **Execute Runs** - Generate AI responses with full RAG pipeline integration 4. **Track Progress** - Monitor run status, usage, and performance 5. **Retrieve History** - Access complete conversation and run history ### Key Features - **🧠 RAG Integration**: Automatic knowledge retrieval from your vector database - **🔄 Provider Agnostic**: Works with both OpenAI and Anthropic models - **📡 Real-time Streaming**: Server-sent events for live response delivery - **📊 Rich Metadata**: Track models, providers, settings, and usage - **🎯 Advanced Retrieval**: Multiple strategies (semantic, hybrid, source-prioritized) - **💬 OpenAI Compatible**: Drop-in replacement for OpenAI Threads API ### Typical Chat Application Flow #### **Recommended: Single-Call Thread Creation** ``` 1. User: "How do I create a Stellar account?" 2. POST /threads Body: { "messages": [{ "role": "user", "content": "How do I create a Stellar account?" }] } → Creates thread + adds message in one call 3. POST /threads/{thread_id}/runs Body: { "model": "gpt-4o-mini", "temperature": 0.7 } → Triggers RAG-powered response generation → Returns streaming SSE response → Automatically creates assistant message with metadata ``` #### **Multi-turn Conversation** ``` 4. User: "What about multi-signature accounts?" 5. POST /threads/{thread_id}/messages Body: { "role": "user", "content": "What about multi-signature accounts?" } 6. POST /threads/{thread_id}/runs Body: { "model": "claude-3-5-sonnet-20241022" } → AI uses full conversation history + retrieves relevant context → Generates contextual response building on previous discussion ``` ### Benefits for Developers - **🚀 Simplified Integration**: No need to manage conversation state - **⚡ Performance**: Optimized RAG pipeline with intelligent caching - **🔧 Flexible Configuration**: Fine-tune retrieval, models, and generation - **📈 Observability**: Detailed run tracking and usage analytics - **🛡️ Reliability**: Built-in error handling and fallback strategies ## Authentication All requests require an API key in the `x-api-key` header: ```bash x-api-key: YOUR_API_KEY ``` ## Thread Management ### Create a Thread Create a new conversation thread, optionally with initial messages. **Endpoint:** `POST /threads` #### Request Body ```json { "title": "Stellar Smart Contracts Help", "metadata": { "user_id": "user_123", "session": "web_chat", "source": "website" }, "messages": [ { "role": "user", "content": "I need help with Stellar smart contracts" } ] } ``` #### Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `title` | string | No | Thread title (max 60 characters). If not provided, will be auto-generated after first assistant response | | `metadata` | object | No | Custom metadata for the thread | | `messages` | array | No | Initial messages to add to the thread | | `messages[].role` | string | Yes | Message role: `"user"` or `"assistant"` | | `messages[].content` | string | Yes | Message content | #### Response ```json { "id": "thread_abc123def456", "object": "thread", "created_at": 1699014083, "title": "Stellar Smart Contracts Help", "metadata": { "user_id": "user_123", "session": "web_chat", "source": "website" } } ``` ### Retrieve a Thread Get details about a specific thread. **Endpoint:** `GET /threads/{thread_id}` #### Response ```json { "id": "thread_abc123def456", "object": "thread", "created_at": 1699014083, "title": "Stellar Smart Contracts Help", "metadata": { "user_id": "user_123", "session": "web_chat", "source": "website" } } ``` ### Retrieve All Threads Get all threads, optionally with pagination. **Endpoint:** `GET /threads` #### Query Parameters | Parameter | Type | Required | Description | Default | |-----------|------|----------|-------------|---------| | `limit` | number | No | Maximum number of threads to return | `10` | | `offset` | number | No | Number of threads to skip | `0` | #### Response ```json { "object": "list", "data": [ { "id": "thread_abc123def456", "object": "thread", "created_at": 1699014083, "title": "Stellar Smart Contracts Help", "metadata": { "user_id": "user_123", "session": "web_chat", "source": "website" } }, ... ], "first_id": "thread_abc123def456", "last_id": "thread_xyz789abc123", "total_count": 100 } ``` ### Update Thread Title Update the title of an existing thread. Useful for manually setting or correcting auto-generated titles. **Endpoint:** `PATCH /threads/{thread_id}` #### Request Body ```json { "title": "Updated Thread Title" } ``` #### Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `title` | string | Yes | New thread title (1-60 characters) | #### Response ```json { "id": "thread_abc123def456", "object": "thread", "created_at": 1699014083, "title": "Updated Thread Title", "metadata": { "user_id": "user_123", "session": "web_chat", "source": "website" } } ``` ### Delete a Thread Delete a thread and all its associated messages and runs. **Endpoint:** `DELETE /threads/{thread_id}` #### Response ```json { "deleted": true } ``` ## Automatic Title Generation Threads automatically generate descriptive titles after the first assistant response is completed. This feature: - **Triggers automatically** when a thread has no title and receives its first assistant response - **Uses lightweight AI** (GPT-4o-mini) to generate concise, descriptive titles (max 60 characters) - **Analyzes conversation context** from both the user's question and assistant's response - **Provides intelligent fallbacks** if generation fails (uses first sentence or truncated user message) - **Fails gracefully** without affecting the main conversation flow #### Title Generation Behavior | Scenario | Behavior | |----------|----------| | Thread created with title | No auto-generation, uses provided title | | Thread created without title | Auto-generates after first assistant response | | Very short messages | Skips generation, uses fallback | | Error responses | Skips generation to avoid unhelpful titles | | Generation failure | Uses intelligent fallback based on user message | #### Manual Override You can always override auto-generated titles using the `PATCH /threads/{thread_id}` endpoint. ## Message Management ### Add a Message Add a new message to an existing thread. **Endpoint:** `POST /threads/{thread_id}/messages` #### Request Body ```json { "role": "user", "content": "How do I create a Stellar account?", "title": "Stellar Account Creation" } ``` #### Parameters | Parameter | Type | Required | Description | Default | |-----------|------|----------|-------------|---------| | `role` | string | No | Message role: `"user"` or `"assistant"` | `"user"` | | `content` | string | Yes | Message content (minimum 1 character) | - | | `title` | string | No | Optional title for the thread (max 60 characters) | - | #### Response ```json { "id": "msg_xyz789abc123", "object": "thread.message", "created_at": 1699014083, "thread_id": "thread_abc123def456", "role": "user", "content": "How do I create a Stellar account?", "title": "Stellar Account Creation" } ``` ### List Messages Retrieve all messages in a thread, ordered chronologically. **Endpoint:** `GET /threads/{thread_id}/messages` #### Response ```json { "object": "list", "data": [ { "id": "msg_xyz789abc123", "object": "thread.message", "created_at": 1699014083, "thread_id": "thread_abc123def456", "role": "user", "content": "How do I create a Stellar account?", "metadata": { "timestamp": "2024-01-15T10:30:00Z" } }, { "id": "msg_def456ghi789", "object": "thread.message", "created_at": 1699014143, "thread_id": "thread_abc123def456", "role": "assistant", "content": "To create a Stellar account, you'll need to generate a keypair...", "title": "Stellar Account Creation", "metadata": { "runId": "run_abc123def456", "model": "gpt-4o-mini", "provider": "openai", "temperature": 0.7, "retrievalStrategy": "SEMANTIC", "created_at": 1699014143 } } ], "first_id": "msg_xyz789abc123", "last_id": "msg_def456ghi789", "has_more": false } ``` ### Add Message Feedback Provide feedback on an assistant message to improve model performance. **Endpoint:** `POST /threads/{thread_id}/messages/{message_id}/feedback` #### Request Body ```json { "rating": "thumbs_up", "comment": "Very helpful explanation with clear examples!" } ``` #### Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `rating` | string | Yes | `"thumbs_up"` or `"thumbs_down"` | | `comment` | string | No | Optional feedback comment | #### Response ```json { "success": true } ``` ## Run Management ### Create a Run Execute the RAG-powered AI system to generate a response for the thread. This automatically creates an assistant message and streams the response in real-time. **Endpoint:** `POST /threads/{thread_id}/runs` #### Request Body ```json { "provider": "openai", "model": "gpt-4o-mini", "temperature": 0.7, "max_tokens": 1500, "topK": 5, "retrievalStrategy": "HYBRID", "structuredOutput": false, "includeCitations": true, "filter": { "source_type": "documentation" }, "promptTemplate": "system-api-base", "userTemplate": "user-api-base", "promptVariables": { "expertise_level": "intermediate" } } ``` #### Parameters | Parameter | Type | Required | Description | Default | |-----------|------|----------|-------------|---------| | **Provider Configuration** | | `provider` | string | No | `"openai"` or `"anthropic"` | `"openai"` | | `model` | string | No | Model name (e.g., `"gpt-4o-mini"`, `"claude-3-5-sonnet-20241022"`) | `"gpt-4o-mini"` | | `temperature` | number | No | Sampling temperature (0.0-2.0) | `0.5` | | `max_tokens` | number | No | Maximum tokens to generate | `1500` | | **Retrieval Configuration** | | `topK` | number | No | Number of context chunks to retrieve (1-20) | `5` | | `retrievalStrategy` | string | No | `"SEMANTIC"`, `"KEYWORD"`, `"HYBRID"`, `"SOURCE_PRIORITIZED"` | `"SEMANTIC"` | | `filter` | object | No | Filter criteria for knowledge retrieval | `{}` | | `sourcePriorities` | array | No | Source prioritization for `SOURCE_PRIORITIZED` strategy | `[]` | | `includeAllSources` | boolean | No | Include all available sources | `false` | | **Output Configuration** | | `structuredOutput` | boolean | No | Enable structured JSON output | `false` | | `includeCitations` | boolean | No | Include source citations in response | `false` | | `response_format` | object | No | OpenAI-compatible response format | `{"type": "text"}` | | **Prompt Configuration** | | `promptTemplate` | string | No | System prompt template to use | `"system-api-base"` | | `userTemplate` | string | No | User message template to use | `"user-api-base"` | | `promptVariables` | object | No | Variables to inject into prompt templates | `{}` | #### Retrieval Strategies | Strategy | Description | Use Case | |----------|-------------|----------| | `SEMANTIC` | Vector similarity search using embeddings | Best for conceptual queries | | `KEYWORD` | Traditional keyword/term matching | Best for exact term searches | | `HYBRID` | Combined semantic and keyword search | **Recommended** - balanced approach | | `SOURCE_PRIORITIZED` | Prioritize specific source types | When you need specific source types | #### Source Prioritization When using `SOURCE_PRIORITIZED` strategy, specify which sources to prioritize: ```json { "retrievalStrategy": "SOURCE_PRIORITIZED", "sourcePriorities": [ { "sourceType": "documentation", "weight": 0.7, "topK": 5 }, { "sourceType": "github-issues", "weight": 0.3, "topK": 3 } ], "includeAllSources": false } ``` **Note**: If `sourcePriorities` is not provided with `SOURCE_PRIORITIZED`, the system automatically falls back to `SEMANTIC` strategy. #### Response Headers ``` Content-Type: text/event-stream Cache-Control: no-cache Connection: keep-alive X-Run-ID: run_def456ghi789 X-Message-ID: msg_jkl012mno345 ``` #### Streaming Response The response uses Server-Sent Events (SSE) format. Each event contains JSON data: ``` data: {"type": "context", "context": [{"id": "doc_123", "score": 0.95, "text": "Stellar accounts are..."}]} data: {"type": "content", "content": "To create a Stellar account"} data: {"type": "content", "content": ", you'll need to follow these steps:\n\n1. "} data: {"type": "content", "content": "Generate a keypair using the Stellar SDK..."} data: {"type": "done", "messageId": "msg_jkl012mno345", "runId": "run_def456ghi789", "usage": {"prompt_tokens": 150, "completion_tokens": 75, "total_tokens": 225}} data: [DONE] ``` #### Event Types | Type | Description | When Sent | |------|-------------|-----------| | `context` | Retrieved knowledge base context | Once, when context is ready | | `content` | Incremental text content from AI | Multiple times during generation | | `done` | Final completion with metadata | Once, when generation completes | #### Event Data Structure **Context Event:** ```json { "type": "context", "context": [ { "id": "doc_123", "score": 0.95, "text": "Stellar accounts are identified by...", "metadata": { "source": "stellar-docs", "url": "https://developers.stellar.org/docs/accounts" } } ] } ``` **Content Event:** ```json { "type": "content", "content": "To create a Stellar account" } ``` **Done Event:** ```json { "type": "done", "messageId": "msg_jkl012mno345", "runId": "run_def456ghi789", "usage": { "prompt_tokens": 150, "completion_tokens": 75, "total_tokens": 225 } } ``` ### Retrieve a Run Get details about a specific run. ``` GET /v1/threads/{thread_id}/runs/{run_id} ``` #### Response ```json { "id": "run_def456ghi789", "object": "thread.run", "created_at": 1699000000, "thread_id": "thread_abc123def456", "status": "completed", "started_at": 1699000001, "completed_at": 1699000015, "model": "claude-3-opus-20240229", "metadata": { "retrieval_strategy": "source_prioritized" }, "usage": { "prompt_tokens": 150, "completion_tokens": 75, "total_tokens": 225 } } ``` ### List Runs Get all runs for a thread. ``` GET /v1/threads/{thread_id}/runs ``` #### Response ```json { "object": "list", "data": [ { "id": "run_def456ghi789", "object": "thread.run", "created_at": 1699000000, "thread_id": "thread_abc123def456", "status": "completed", "model": "claude-3-opus-20240229" } ], "first_id": "run_def456ghi789", "last_id": "run_def456ghi789", "has_more": false } ``` ### Cancel a Run Cancel a running operation. ``` POST /v1/threads/{thread_id}/runs/{run_id}/cancel ``` #### Response ```json { "id": "run_def456ghi789", "object": "thread.run", "created_at": 1699000000, "thread_id": "thread_abc123def456", "status": "cancelled", "cancelled_at": 1699000010 } ``` ## Run Status Values | Status | Description | |--------|-------------| | `queued` | Run is waiting to be processed | | `in_progress` | Run is currently executing | | `requires_action` | Run requires user action (future use) | | `cancelling` | Run is being cancelled | | `cancelled` | Run was cancelled | | `failed` | Run failed with an error | | `completed` | Run completed successfully | | `expired` | Run expired before completion | ## Examples ### Complete Conversation Flow #### 1. Create Thread with Initial Message ```bash # Option A: Let the system auto-generate a title after first response curl -X POST https://api.stellabot.app/v1/threads \ -H "x-api-key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "metadata": {"user_id": "user_123"}, "messages": [ { "role": "user", "content": "I want to learn about Stellar smart contracts" } ] }' # Option B: Provide a custom title upfront curl -X POST https://api.stellabot.app/v1/threads \ -H "x-api-key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "title": "Learning Stellar Smart Contracts", "metadata": {"user_id": "user_123"}, "messages": [ { "role": "user", "content": "I want to learn about Stellar smart contracts" } ] }' ``` #### 2. Create Run to Generate Response ```bash curl -X POST https://api.stellabot.app/v1/threads/thread_abc123/runs \ -H "x-api-key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-3-opus-20240229", "temperature": 0.7, "retrieval_strategy": "hybrid", "include_citations": true }' ``` #### 3. Add Follow-up Message ```bash curl -X POST https://api.stellabot.app/v1/threads/thread_abc123/messages \ -H "x-api-key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "role": "user", "content": "Can you show me a code example?" }' ``` #### 4. Create Another Run ```bash curl -X POST https://api.stellabot.app/v1/threads/thread_abc123/runs \ -H "x-api-key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o-mini", "response_format": {"type": "json_object"}, "retrieval_strategy": "source_prioritized" }' ``` #### 5. Update Thread Title (Optional) ```bash # Update the auto-generated or existing title curl -X PATCH https://api.stellabot.app/v1/threads/thread_abc123 \ -H "x-api-key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "title": "Stellar Smart Contracts: Complete Guide" }' ``` ### Advanced Run Configuration ```bash curl -X POST https://api.stellabot.app/v1/threads/thread_abc123/runs \ -H "x-api-key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "provider": "anthropic", "model": "claude-3-sonnet-20240229", "temperature": 0.3, "maxTokens": 2000, "retrievalStrategy": "HYBRID", "includeCitations": true, "filter": { "source_type": "documentation", "category": "smart_contracts" }, "promptTemplate": "code-examples", "promptVariables": { "language": "javascript", "complexity": "intermediate" } }' ``` ### Monitor Run Progress ```bash # Check run status curl -X GET https://api.stellabot.app/v1/threads/thread_abc123/runs/run_def456 \ -H "x-api-key: YOUR_API_KEY" # Cancel if needed curl -X POST https://api.stellabot.app/v1/threads/thread_abc123/runs/run_def456/cancel \ -H "x-api-key: YOUR_API_KEY" ``` ## Error Responses ### 400 Bad Request ```json { "error": "Thread has no messages" } ``` ### 404 Not Found ```json { "error": "Thread not found" } ``` ### 404 Run Not Found ```json { "error": "Run not found or cannot be cancelled" } ``` ## Best Practices ### Thread Management 1. **Use meaningful metadata** - Store user IDs, session info, etc. 2. **Let titles auto-generate** - System creates descriptive titles after first response 3. **Override when needed** - Use PATCH to update auto-generated titles if necessary 4. **Keep titles concise** - Maximum 60 characters for optimal display 5. **Clean up old threads** - Delete threads when conversations end 6. **Monitor thread size** - Very long threads may impact performance ### Message Management 2. **Add context in system messages** - Provide relevant background 3. **Use metadata for tracking** - Store timestamps, sources, etc. 4. **Collect feedback** - Use thumbs up/down for model improvement ### Run Management 1. **Monitor run status** - Check for failures and handle appropriately 2. **Use appropriate models** - Balance cost, speed, and quality 3. **Cancel long-running operations** - Prevent resource waste 4. **Track token usage** - Monitor costs and optimize prompts ### Performance Optimization 1. **Choose optimal retrieval strategies** - `hybrid` for best results 2. **Use filters effectively** - Narrow search scope when possible 3. **Set reasonable token limits** - Prevent excessive generation 4. **Stream responses** - Better user experience for long responses ## Model Support ### OpenAI Models - `gpt-4o` - Latest GPT-4 Omni model - `gpt-4o-mini` - Faster, cost-effective GPT-4 Omni - `gpt-4-turbo` - GPT-4 Turbo with 128K context - `gpt-4` - Standard GPT-4 model ### Anthropic Models - `claude-3-5-sonnet-20241022` - Latest Claude 3.5 Sonnet (8K output) - `claude-3-5-haiku-20241022` - Latest Claude 3.5 Haiku (fast) - `claude-3-opus-20240229` - Most capable Claude 3 model - `claude-3-sonnet-20240229` - Balanced performance and speed - `claude-3-haiku-20240307` - Fastest Claude 3 model ### Automatic Provider Detection The API automatically detects the provider based on the model name: - Models containing `claude` → Anthropic provider - Models containing `gpt` → OpenAI provider - No need to specify provider explicitly ## OpenAI Compatibility The Threads API is designed to be compatible with OpenAI's Assistants API pattern while adding powerful RAG capabilities: ### Key Similarities: - Thread and message management - Run-based execution model - Streaming responses - Status tracking ### Key Differences: - **RAG Integration** - Automatic knowledge retrieval - **Enhanced Parameters** - Support for retrieval strategies - **Flexible Naming** - Both snake_case and camelCase support - **Source Citations** - Optional citation of retrieved sources - **Advanced Filtering** - Filter knowledge base searches ## Rate Limits Rate limits are applied per API key: - **Thread operations**: 1,000/hour - **Message operations**: 5,000/hour - **Run operations**: 500/hour - **Enterprise**: Custom limits available