Office365 Tool Structure for Data Integration in Microsoft 365 OneDrive

# Office365 Tool Structure for Data Integration in Microsoft 365 OneDrive **Created at** 2025-07-11 **Created by** Savan Desai --- ## Overview The Microsoft 365 Excel Integration Tool is designed to append structured data to a pre-configured Excel file stored in OneDrive. The tool uses Microsoft Graph API to interact with OneDrive and Excel. It is built to run asynchronously using Celery and relies on environment variables for configuration, making it secure and easy to deploy. Key features: - **Environment Configuration**: All credentials and settings are stored in environment variables. - **Asynchronous Processing**: Uses Celery to handle data appending in the background. - **Data Validation**: Validates input data format before processing. - **Retry Mechanism**: Automatically retries on temporary failures (e.g., file locked). - **Header Preservation**: Merges new data with existing Excel headers. ## Flowchart ```mermaid flowchart TD A([User Provides Data]) --> B{Data Valid?} B -->|Yes| C[Connect to Microsoft 365] B -->|No| D[Show Input Error] C --> E[Locate Excel File] E --> F[Prepare Data Update] F --> G[Update Excel File] G --> H{Update Successful?} H -->|Yes| I[Show Success Message] H -->|No| J[Show Update Error] I --> K([End]) J --> K D --> K ``` Detailed files descriptions. --- ## 1. `kalimera_microsoft365_tool_task.py` ### Purpose This file defines the Celery task that handles the entire process of appending data to the Excel file in OneDrive. It is responsible for: - Loading configuration from environment variables. - Acquiring an access token for Microsoft Graph API. - Locating the Excel file by name in OneDrive. - Retrieving the existing data and merging it with new data. - Updating the Excel file with the combined data, including retry logic for locked files. ### Code Description ```python import json import logging import time from celery import shared_task import requests import msal import os from openpyxl.utils import get_column_letter @shared_task(queue='kalimera_microsoft365_tool_task') def store_in_office(self, data: str): """ Celery task to append data to Excel using environment configuration Environment Requirements: CLIENT_ID: Azure AD application ID CLIENT_SECRET: Azure AD client secret TENANT_ID: Azure tenant ID EXCEL_FILE_NAME: Target Excel filename in OneDrive """ try: # Load configuration from environment variables config = { 'client_id': os.getenv('CLIENT_ID'), 'client_secret': os.getenv('CLIENT_SECRET'), 'tenant_id': os.getenv('TENANT_ID'), 'file_name': os.getenv('EXCEL_FILE_NAME') } # Check if any required environment variable is missing if None in config.values(): missing = [k for k, v in config.items() if v is None] raise ValueError(f"Missing environment variables: {', '.join(missing)}") # Get access token using the configuration access_token = get_access_token(config) # Find the Excel file by name and get its ID file_id = find_file_by_name(access_token, config['file_name']) if not file_id: raise FileNotFoundError(f"Excel file '{config['file_name']}' not found") # Update the Excel file with the provided data return update_excel(access_token, file_id, data) except Exception as e: logging.error(f"Task failed: {str(e)}") raise self.retry(exc=e, countdown=10) # Helper function to get an access token def get_access_token(config: dict) -> str: # Create a confidential client application app = msal.ConfidentialClientApplication( config['client_id'], client_credential=config['client_secret'], authority=f"https://login.microsoftonline.com/{config['tenant_id']}" ) # Acquire token for Microsoft Graph result = app.acquire_token_for_client(scopes=["https://graph.microsoft.com/.default"]) if "access_token" not in result: raise Exception("Failed to acquire access token") return result["access_token"] # Helper function to find a file by name in OneDrive def find_file_by_name(access_token: str, file_name: str) -> str: headers = {"Authorization": f"Bearer {access_token}"} # Search for the file by name search_url = f"https://graph.microsoft.com/v1.0/me/drive/root/search(q='{file_name}')" resp = requests.get(search_url, headers=headers) resp.raise_for_status() items = resp.json().get('value', []) # Return the ID of the first matching file return items[0]['id'] if items else None # Function to update Excel with new data def update_excel(access_token: str, file_id: str, data: str, max_attempts: int = 3): headers = { "Authorization": f"Bearer {access_token}", "Content-Type": "application/json" } # Parse the data (if it's a string, convert to object; if it's a single object, convert to list) data = json.loads(data) if isinstance(data, str) else data data = [data] if isinstance(data, dict) else data # Validate that data is a list of dictionaries if not isinstance(data, list) or not all(isinstance(item, dict) for item in data): raise ValueError("Data must be a list of dictionaries") # Get the first worksheet in the Excel file ws_url = f"https://graph.microsoft.com/v1.0/me/drive/items/{file_id}/workbook/worksheets" ws_resp = requests.get(ws_url, headers=headers) ws_resp.raise_for_status() worksheet_id = ws_resp.json()['value'][0]['id'] # Get the used range of the worksheet (existing data) range_url = f"{ws_url}/{worksheet_id}/usedRange(valuesOnly=true)" range_resp = requests.get(range_url, headers=headers) range_resp.raise_for_status() used_values = range_resp.json().get("values", []) # Extract existing header and rows existing_header = used_values[0] if used_values else [] existing_rows = used_values[1:] if len(used_values) > 1 else [] # Get all keys from the new data new_keys = {k for row in data for k in row} # Merge existing header and new keys, then sort unified_keys = sorted(set(existing_header) | new_keys) # Normalize existing rows to the unified header (fill missing with empty string) old_rows = [[dict(zip(existing_header, r)).get(col, '') for col in unified_keys] for r in existing_rows] # Normalize new rows to the unified header new_rows = [[r.get(col, '') for col in unified_keys] for r in data] # Combine: header, existing rows, new rows all_rows = [unified_keys] + old_rows + new_rows # Calculate the target range (e.g., A1:D20) total_rows = len(all_rows) total_cols = len(unified_keys) end_col = get_column_letter(total_cols) # Convert column number to letter (e.g., 3 -> 'C') target_range = f"A1:{end_col}{total_rows}" # Attempt to update the Excel file with retry on lock failure (423) for attempt in range(1, max_attempts + 1): try: # Construct the URL for the range to update patch_url = f"https://graph.microsoft.com/v1.0/me/drive/items/{file_id}/workbook/worksheets/{worksheet_id}/range(address='{target_range}')" payload = {"values": all_rows} response = requests.patch(patch_url, headers=headers, json=payload) response.raise_for_status() return {"status": "success", "updated_rows": len(data)} except requests.HTTPError as e: if e.response.status_code == 423: # Locked # Wait and retry time.sleep(attempt * 5) else: # For other errors, break and then will raise exception after loop break # If all attempts fail, raise an exception raise Exception("All update attempts failed") ``` --- ## 2. `store_data.py` ### Purpose This file defines the tool that is invoked by the user (e.g., from an AI system). It validates the input and queues the Celery task for background processing. ### Code Description ```python import os import json from typing import Any from core.tools.tool.builtin_tool import BuiltinTool class Microsoft365StoreDataTool(BuiltinTool): """ Public interface for Excel data storage Parameters: data: Required - JSON string or serializable object Environment Requirements: EXCEL_FILE_NAME: Pre-configured Excel filename """ def _invoke(self, user_id: str, tool_parameters: dict[str, Any]): # Extract the data parameter data = tool_parameters.get('data') if not data: return self.create_text_message("Missing required parameter: data") # Get the Excel file name from environment for the message file_name = os.getenv('EXCEL_FILE_NAME', 'Excel file') # Import the Celery task from tasks.microsoft365_tool_task import store_in_office # If data is not a string (i.e., it's a dict or list), convert to JSON string # Otherwise, use as is (assuming it's already a JSON string) data_to_send = json.dumps(data) if not isinstance(data, str) else data # Queue the task store_in_office.delay(data_to_send) # Return a success message to the user return self.create_text_message( f"Data is being appended to {file_name}. This may take a few moments." ) ``` --- ## 3. `store_data.yaml` ### Purpose This YAML file registers the tool in the system, providing metadata, labels, and parameter definitions for the user interface. ### YAML Description ```yaml identity: name: store_data author: Savan Desai label: en_US: Store in Excel zh_Hans: 存储到 Excel pt_BR: Armazenar no Excel description: human: en_US: Append data to configured Excel file zh_Hans: 追加数据到配置的 Excel 文件 pt_BR: Adicionar dados ao arquivo Excel configurado llm: A tool for appending data to a pre-configured Excel file parameters: - name: data type: string required: true label: en_US: Data to store (JSON) zh_Hans: 要存储的数据 (JSON) pt_BR: Dados para armazenar (JSON) # Note: The description in the YAML is concise. The tool expects JSON data. ``` --- ## 4. `microsoft365.py` ### Purpose This file validates the provider credentials (environment variables) during tool setup. It ensures that: - Required environment variables are set. - The access token can be obtained. - The configured Excel file exists in OneDrive. ### Code Description ```python import os import msal import requests import logging from typing import Any from core.tools.errors import ToolProviderCredentialValidationError from core.tools.provider.builtin_tool_provider import BuiltinToolProviderController class Microsoft365Provider(BuiltinToolProviderController): def _validate_credentials(self, credentials: dict[str, Any]) -> None: """ Validates credentials by: 1. Checking required environment variables are set. 2. Acquiring an access token. 3. Verifying the Excel file exists in OneDrive. """ try: # List of required environment variables required = ['CLIENT_ID', 'CLIENT_SECRET', 'TENANT_ID', 'EXCEL_FILE_NAME'] # Check which variables are missing missing = [v for v in required if not os.getenv(v)] if missing: raise ValueError(f"Missing env variables: {', '.join(missing)}") # Get access token access_token = self._get_access_token() # Check if the file exists if not self._file_exists(access_token, os.getenv('EXCEL_FILE_NAME')): raise FileNotFoundError("Excel file not found") except Exception as e: # Wrap any exception in a credential validation error raise ToolProviderCredentialValidationError(str(e)) def _get_access_token(self) -> str: # Create MSAL application app = msal.ConfidentialClientApplication( client_id=os.getenv('CLIENT_ID'), client_credential=os.getenv('CLIENT_SECRET'), authority=f"https://login.microsoftonline.com/{os.getenv('TENANT_ID')}" ) # Acquire token for Microsoft Graph result = app.acquire_token_for_client(scopes=["https://graph.microsoft.com/.default"]) if "access_token" not in result: raise Exception("Failed to get token") return result["access_token"] def _file_exists(self, token: str, filename: str) -> bool: headers = {"Authorization": f"Bearer {token}"} # Search for the file by name search_url = f"https://graph.microsoft.com/v1.0/me/drive/root/search(q='{filename}')" resp = requests.get(search_url, headers=headers) # Check the response and look for the exact filename return any(i['name'] == filename for i in resp.json().get('value', [])) ``` --- ## 5. `microsoft365.yaml` ### Purpose This YAML file registers the Microsoft 365 provider, including its metadata and tags. ### YAML Description ```yaml identity: author: Savan Desai name: microsoft365 label: en_US: Microsoft 365 zh_Hans: Microsoft 365 pt_BR: Microsoft 365 description: en_US: Microsoft 365 Integration zh_Hans: Microsoft 365 集成 pt_BR: Integração Microsoft 365 icon: office_icon.svg tags: - office - storage ``` --- ## Environment Variables (`.env`) The tool requires the following environment variables: ```env CLIENT_ID=your-client-id CLIENT_SECRET=your-client-secret TENANT_ID=your-tenant-id EXCEL_FILE_NAME=data.xlsx ``` ## Running the Tool 1. **Start Celery Worker**: ```bash celery -A tasks worker --loglevel=info ``` 2. **Trigger the Tool**: Invoke the `store_data` tool from your application, passing the data to store. Example of triggering the Celery task from the CLI (for testing): ```bash python kalimera_microsoft365_tool_task.py ``` ## Conclusion This tool provides a robust and secure way to append data to an Excel file in OneDrive. It leverages environment variables for configuration, Celery for asynchronous processing, and Microsoft Graph API for interactions with Microsoft 365. The detailed documentation above should help in understanding, maintaining, and extending the tool.