# Office365 Tool Structure for Data Integration in Microsoft 365 OneDrive
**Created at** 2025-07-11
**Created by** Savan Desai
---
## Overview
The Microsoft 365 Excel Integration Tool is designed to append structured data to a pre-configured Excel file stored in OneDrive. The tool uses Microsoft Graph API to interact with OneDrive and Excel. It is built to run asynchronously using Celery and relies on environment variables for configuration, making it secure and easy to deploy.
Key features:
- **Environment Configuration**: All credentials and settings are stored in environment variables.
- **Asynchronous Processing**: Uses Celery to handle data appending in the background.
- **Data Validation**: Validates input data format before processing.
- **Retry Mechanism**: Automatically retries on temporary failures (e.g., file locked).
- **Header Preservation**: Merges new data with existing Excel headers.
## Flowchart
```mermaid
flowchart TD
A([User Provides Data]) --> B{Data Valid?}
B -->|Yes| C[Connect to Microsoft 365]
B -->|No| D[Show Input Error]
C --> E[Locate Excel File]
E --> F[Prepare Data Update]
F --> G[Update Excel File]
G --> H{Update Successful?}
H -->|Yes| I[Show Success Message]
H -->|No| J[Show Update Error]
I --> K([End])
J --> K
D --> K
```
Detailed files descriptions.
---
## 1. `kalimera_microsoft365_tool_task.py`
### Purpose
This file defines the Celery task that handles the entire process of appending data to the Excel file in OneDrive. It is responsible for:
- Loading configuration from environment variables.
- Acquiring an access token for Microsoft Graph API.
- Locating the Excel file by name in OneDrive.
- Retrieving the existing data and merging it with new data.
- Updating the Excel file with the combined data, including retry logic for locked files.
### Code Description
```python
import json
import logging
import time
from celery import shared_task
import requests
import msal
import os
from openpyxl.utils import get_column_letter
@shared_task(queue='kalimera_microsoft365_tool_task')
def store_in_office(self, data: str):
"""
Celery task to append data to Excel using environment configuration
Environment Requirements:
CLIENT_ID: Azure AD application ID
CLIENT_SECRET: Azure AD client secret
TENANT_ID: Azure tenant ID
EXCEL_FILE_NAME: Target Excel filename in OneDrive
"""
try:
# Load configuration from environment variables
config = {
'client_id': os.getenv('CLIENT_ID'),
'client_secret': os.getenv('CLIENT_SECRET'),
'tenant_id': os.getenv('TENANT_ID'),
'file_name': os.getenv('EXCEL_FILE_NAME')
}
# Check if any required environment variable is missing
if None in config.values():
missing = [k for k, v in config.items() if v is None]
raise ValueError(f"Missing environment variables: {', '.join(missing)}")
# Get access token using the configuration
access_token = get_access_token(config)
# Find the Excel file by name and get its ID
file_id = find_file_by_name(access_token, config['file_name'])
if not file_id:
raise FileNotFoundError(f"Excel file '{config['file_name']}' not found")
# Update the Excel file with the provided data
return update_excel(access_token, file_id, data)
except Exception as e:
logging.error(f"Task failed: {str(e)}")
raise self.retry(exc=e, countdown=10)
# Helper function to get an access token
def get_access_token(config: dict) -> str:
# Create a confidential client application
app = msal.ConfidentialClientApplication(
config['client_id'],
client_credential=config['client_secret'],
authority=f"https://login.microsoftonline.com/{config['tenant_id']}"
)
# Acquire token for Microsoft Graph
result = app.acquire_token_for_client(scopes=["https://graph.microsoft.com/.default"])
if "access_token" not in result:
raise Exception("Failed to acquire access token")
return result["access_token"]
# Helper function to find a file by name in OneDrive
def find_file_by_name(access_token: str, file_name: str) -> str:
headers = {"Authorization": f"Bearer {access_token}"}
# Search for the file by name
search_url = f"https://graph.microsoft.com/v1.0/me/drive/root/search(q='{file_name}')"
resp = requests.get(search_url, headers=headers)
resp.raise_for_status()
items = resp.json().get('value', [])
# Return the ID of the first matching file
return items[0]['id'] if items else None
# Function to update Excel with new data
def update_excel(access_token: str, file_id: str, data: str, max_attempts: int = 3):
headers = {
"Authorization": f"Bearer {access_token}",
"Content-Type": "application/json"
}
# Parse the data (if it's a string, convert to object; if it's a single object, convert to list)
data = json.loads(data) if isinstance(data, str) else data
data = [data] if isinstance(data, dict) else data
# Validate that data is a list of dictionaries
if not isinstance(data, list) or not all(isinstance(item, dict) for item in data):
raise ValueError("Data must be a list of dictionaries")
# Get the first worksheet in the Excel file
ws_url = f"https://graph.microsoft.com/v1.0/me/drive/items/{file_id}/workbook/worksheets"
ws_resp = requests.get(ws_url, headers=headers)
ws_resp.raise_for_status()
worksheet_id = ws_resp.json()['value'][0]['id']
# Get the used range of the worksheet (existing data)
range_url = f"{ws_url}/{worksheet_id}/usedRange(valuesOnly=true)"
range_resp = requests.get(range_url, headers=headers)
range_resp.raise_for_status()
used_values = range_resp.json().get("values", [])
# Extract existing header and rows
existing_header = used_values[0] if used_values else []
existing_rows = used_values[1:] if len(used_values) > 1 else []
# Get all keys from the new data
new_keys = {k for row in data for k in row}
# Merge existing header and new keys, then sort
unified_keys = sorted(set(existing_header) | new_keys)
# Normalize existing rows to the unified header (fill missing with empty string)
old_rows = [[dict(zip(existing_header, r)).get(col, '') for col in unified_keys] for r in existing_rows]
# Normalize new rows to the unified header
new_rows = [[r.get(col, '') for col in unified_keys] for r in data]
# Combine: header, existing rows, new rows
all_rows = [unified_keys] + old_rows + new_rows
# Calculate the target range (e.g., A1:D20)
total_rows = len(all_rows)
total_cols = len(unified_keys)
end_col = get_column_letter(total_cols) # Convert column number to letter (e.g., 3 -> 'C')
target_range = f"A1:{end_col}{total_rows}"
# Attempt to update the Excel file with retry on lock failure (423)
for attempt in range(1, max_attempts + 1):
try:
# Construct the URL for the range to update
patch_url = f"https://graph.microsoft.com/v1.0/me/drive/items/{file_id}/workbook/worksheets/{worksheet_id}/range(address='{target_range}')"
payload = {"values": all_rows}
response = requests.patch(patch_url, headers=headers, json=payload)
response.raise_for_status()
return {"status": "success", "updated_rows": len(data)}
except requests.HTTPError as e:
if e.response.status_code == 423: # Locked
# Wait and retry
time.sleep(attempt * 5)
else:
# For other errors, break and then will raise exception after loop
break
# If all attempts fail, raise an exception
raise Exception("All update attempts failed")
```
---
## 2. `store_data.py`
### Purpose
This file defines the tool that is invoked by the user (e.g., from an AI system). It validates the input and queues the Celery task for background processing.
### Code Description
```python
import os
import json
from typing import Any
from core.tools.tool.builtin_tool import BuiltinTool
class Microsoft365StoreDataTool(BuiltinTool):
"""
Public interface for Excel data storage
Parameters:
data: Required - JSON string or serializable object
Environment Requirements:
EXCEL_FILE_NAME: Pre-configured Excel filename
"""
def _invoke(self, user_id: str, tool_parameters: dict[str, Any]):
# Extract the data parameter
data = tool_parameters.get('data')
if not data:
return self.create_text_message("Missing required parameter: data")
# Get the Excel file name from environment for the message
file_name = os.getenv('EXCEL_FILE_NAME', 'Excel file')
# Import the Celery task
from tasks.microsoft365_tool_task import store_in_office
# If data is not a string (i.e., it's a dict or list), convert to JSON string
# Otherwise, use as is (assuming it's already a JSON string)
data_to_send = json.dumps(data) if not isinstance(data, str) else data
# Queue the task
store_in_office.delay(data_to_send)
# Return a success message to the user
return self.create_text_message(
f"Data is being appended to {file_name}. This may take a few moments."
)
```
---
## 3. `store_data.yaml`
### Purpose
This YAML file registers the tool in the system, providing metadata, labels, and parameter definitions for the user interface.
### YAML Description
```yaml
identity:
name: store_data
author: Savan Desai
label:
en_US: Store in Excel
zh_Hans: 存储到 Excel
pt_BR: Armazenar no Excel
description:
human:
en_US: Append data to configured Excel file
zh_Hans: 追加数据到配置的 Excel 文件
pt_BR: Adicionar dados ao arquivo Excel configurado
llm: A tool for appending data to a pre-configured Excel file
parameters:
- name: data
type: string
required: true
label:
en_US: Data to store (JSON)
zh_Hans: 要存储的数据 (JSON)
pt_BR: Dados para armazenar (JSON)
# Note: The description in the YAML is concise. The tool expects JSON data.
```
---
## 4. `microsoft365.py`
### Purpose
This file validates the provider credentials (environment variables) during tool setup. It ensures that:
- Required environment variables are set.
- The access token can be obtained.
- The configured Excel file exists in OneDrive.
### Code Description
```python
import os
import msal
import requests
import logging
from typing import Any
from core.tools.errors import ToolProviderCredentialValidationError
from core.tools.provider.builtin_tool_provider import BuiltinToolProviderController
class Microsoft365Provider(BuiltinToolProviderController):
def _validate_credentials(self, credentials: dict[str, Any]) -> None:
"""
Validates credentials by:
1. Checking required environment variables are set.
2. Acquiring an access token.
3. Verifying the Excel file exists in OneDrive.
"""
try:
# List of required environment variables
required = ['CLIENT_ID', 'CLIENT_SECRET', 'TENANT_ID', 'EXCEL_FILE_NAME']
# Check which variables are missing
missing = [v for v in required if not os.getenv(v)]
if missing:
raise ValueError(f"Missing env variables: {', '.join(missing)}")
# Get access token
access_token = self._get_access_token()
# Check if the file exists
if not self._file_exists(access_token, os.getenv('EXCEL_FILE_NAME')):
raise FileNotFoundError("Excel file not found")
except Exception as e:
# Wrap any exception in a credential validation error
raise ToolProviderCredentialValidationError(str(e))
def _get_access_token(self) -> str:
# Create MSAL application
app = msal.ConfidentialClientApplication(
client_id=os.getenv('CLIENT_ID'),
client_credential=os.getenv('CLIENT_SECRET'),
authority=f"https://login.microsoftonline.com/{os.getenv('TENANT_ID')}"
)
# Acquire token for Microsoft Graph
result = app.acquire_token_for_client(scopes=["https://graph.microsoft.com/.default"])
if "access_token" not in result:
raise Exception("Failed to get token")
return result["access_token"]
def _file_exists(self, token: str, filename: str) -> bool:
headers = {"Authorization": f"Bearer {token}"}
# Search for the file by name
search_url = f"https://graph.microsoft.com/v1.0/me/drive/root/search(q='{filename}')"
resp = requests.get(search_url, headers=headers)
# Check the response and look for the exact filename
return any(i['name'] == filename for i in resp.json().get('value', []))
```
---
## 5. `microsoft365.yaml`
### Purpose
This YAML file registers the Microsoft 365 provider, including its metadata and tags.
### YAML Description
```yaml
identity:
author: Savan Desai
name: microsoft365
label:
en_US: Microsoft 365
zh_Hans: Microsoft 365
pt_BR: Microsoft 365
description:
en_US: Microsoft 365 Integration
zh_Hans: Microsoft 365 集成
pt_BR: Integração Microsoft 365
icon: office_icon.svg
tags:
- office
- storage
```
---
## Environment Variables (`.env`)
The tool requires the following environment variables:
```env
CLIENT_ID=your-client-id
CLIENT_SECRET=your-client-secret
TENANT_ID=your-tenant-id
EXCEL_FILE_NAME=data.xlsx
```
## Running the Tool
1. **Start Celery Worker**:
```bash
celery -A tasks worker --loglevel=info
```
2. **Trigger the Tool**:
Invoke the `store_data` tool from your application, passing the data to store.
Example of triggering the Celery task from the CLI (for testing):
```bash
python kalimera_microsoft365_tool_task.py
```
## Conclusion
This tool provides a robust and secure way to append data to an Excel file in OneDrive. It leverages environment variables for configuration, Celery for asynchronous processing, and Microsoft Graph API for interactions with Microsoft 365. The detailed documentation above should help in understanding, maintaining, and extending the tool.