# Time2Lang: Bridging Time-Series Foundation Models and Large Language Models for Health Sensing Beyond Prompting
## Authors
Arnav Pillai, Dimitris Spathis, Sujibya Nepal, Amanda C. Collins, Daniel M. Mackin, Michael V. Heinz, Tess Z. Griffin, Nicholas C. Jacobson, Andrew Campbell
Affiliations: Dartmouth College, University of Cambridge, Stanford University, Google Research
## 🧠 Understanding the Research

**Time2Lang** is a novel framework designed to connect **time-series data** from sensors (like wearables or smartphones) with the capabilities of **Large Language Models (LLMs)**, such as LLaMA or GPT. The research addresses a key technical challenge: how can you efficiently feed long, structured numerical data (e.g., heart rate, activity, temperature) into a language model that’s designed for text?
Traditional methods serialize this data into text prompts like `"Heart rate: 72, 73, 75..."`, which is wasteful and inefficient — LLMs treat each number as a separate token, consuming memory and losing critical temporal patterns. **Time2Lang** solves this by introducing a “bridge” between **Time-Series Foundation Models (TSFMs)** and LLMs, enabling rich, contextual health insights without prompt engineering.
---
## 🔍 Motivation
As sensor-based health monitoring becomes more common (especially in mental health and chronic disease), researchers want to use LLMs to generate human-readable feedback, predictions, and recommendations. But these models were not designed for raw sensor streams. The naïve solution — converting numbers into tokens — fails to scale and strips time-awareness from the model. There’s a growing need to **embed time-series understanding into the powerful language generation capacity of LLMs** — and that’s exactly what Time2Lang enables.
---
## 📦 What is a Time-Series Foundation Model?
A **Time-Series Foundation Model (TSFM)** is a large, pretrained model designed to understand time-series data — just like LLMs are trained on language. It learns from vast amounts of sequential data (e.g., heart rate, sleep cycles, glucose readings) to extract patterns like trends, rhythms, anomalies, and seasonality.
When you input raw data like:
```plaintext
[72, 73, 74, 70, 68, 71, ...] (e.g., heart rate over 24 hours)
```
TSFMs output a fixed-size embedding vector that summarizes the input time-series.
The dimensionality (e.g., 768, 512, 1024, etc.) depends on:
- The architecture (e.g., Transformer, ConvNet, MLP)
- The model size (small, base, large)
- The specific implementation (Chronos, TimesFM, etc.)
📌 Examples:
| Model | Typical Output Dimension |
| ----------------------- | ------------------------ |
| Chronos (paper default) | 512–1024 |
| TimesFM (Google) | 512–2048 |
| HuggingFace TS models | 128–1024 |
But here’s the key:
> LLMs can’t use this vector directly — they’ve been trained on text.
So, Time2Lang introduces an **adapter** that maps the TSFM embedding into the **LLM’s latent space** (Z-space). This mapping lets the LLM “understand” the vector as if it came from its own world — unlocking the ability to generate text based on sensor-derived insight.
Think of it as:
* TSFM: **"Speaks sensor data"**
* LLM: **"Speaks language"**
* Time2Lang: **"Interpreter that lets them talk"**
This adapter is trained (often with synthetic data) to align the two models so they can work together seamlessly — even though they were trained on completely different data types.
---
## 🛠️ Methods
The pipeline consists of:
1. **Input**: 10 weeks of ambient sensor data (from wearables and smartphones), tracking \~100 features at 15-minute intervals.
2. **TSFM (Chronos)**: A time-series foundation model processes this data and outputs a latent representation.
3. **Adapter (Time2Lang)**: A lightweight neural module maps the TSFM embedding into the LLM latent space.
4. **LLM (LLaMA)**: Receives the adapted input and generates output for tasks like depression or flourishing prediction.
To bootstrap this mapping, the researchers first pretrain on **200,000 synthetic time-series** labeled with known properties (e.g., periodicity), ensuring the TSFM captures meaningful signal before moving to real-world health tasks.
---
## 📊 Experiments and Results
The team evaluates their system on mental health prediction tasks — specifically **depression** and **flourishing** classification — using **AUROC** and **AUPRC** metrics.
* **AUROC** (Area Under ROC Curve) measures how well the model ranks predictions.
* **AUPRC** (Area Under Precision-Recall Curve) is more sensitive to real-world usefulness, especially in imbalanced datasets like mental health.
Their method, **Time2Lang**, outperforms all baselines:
* **Depression prediction**: 0.73 AUPRC (vs. 0.60 for TSFM and 0.52 for LLaMA alone)
* **Flourishing**: also shows strong gains
This shows that Time2Lang is not only theoretically elegant, but practically valuable.
---
## 🌍 Why This Matters
As wearables and mobile health apps generate more real-time data, we need tools that can turn these raw streams into actionable feedback. Time2Lang provides a **modular, scalable way to plug time-series signals into general-purpose AI**, without retraining massive models or writing clunky prompts.
This architecture is ideal for startups and researchers building **personal health copilots**, **context-aware AI agents**, or **digital phenotyping platforms** — especially in behavioral health, chronic care, or personalized medicine.
---
## 🙏 Acknowledgment
This research is a collaboration between Dartmouth College, University of Cambridge, Stanford University, and Google Research. By uniting TSFMs and LLMs, they demonstrate a critical step forward in multimodal AI for health — where numbers become narratives, and data becomes dialogue.
---