# **Contrastive Pretraining for Stress Detection with Multimodal Wearable Sensor Data and Surveys**
**Authors:** Zeyu Yang, Han Yu, Akane Sano
**Affiliations:** Rice University, Department of Electrical and Computer Engineering; Plaid
---

## **Understanding the Research**
This study explores a novel approach to detecting psychological stress by combining **wearable time series data** (e.g., heart rate, step count) with **tabular survey data** (e.g., personality traits, sleep quality). The research leverages a **contrastive self-supervised learning** framework to align the two modalities without requiring labeled stress data during pretraining. The method was evaluated on two real-world datasets—**LifeSnaps** and **PMData**—to show how multimodal fusion and pretraining can improve stress detection, especially in data-scarce scenarios.
---
## **Motivation**
Detecting stress in everyday life is a critical yet challenging task due to:
* The high cost and burden of collecting labeled stress data
* The isolated use of either physiological or contextual features in past models
* The lack of robust methods for multimodal representation learning in wearables
This research aims to address these challenges by combining multiple data types and utilizing unlabeled data through contrastive learning.
---
## **Core Idea**
The authors hypothesize that a **contrastive pretraining strategy**—which learns to align time series and tabular data representations—can provide a shared embedding space for stress detection. This improves performance even when labeled stress data is limited, by enabling more meaningful feature extraction from wearable and contextual signals.
---
## **Methods**
### 🔢 What Is Time Series vs. Tabular Data?
* **Time Series Data**: High-frequency sensor streams sampled hourly—e.g., steps, heart rate, calories. Captures dynamic behavior.
* **Tabular Data**: Low-frequency or static information such as age, personality scores, location labels, or sleep quality. Reflects context and individual traits.
### 📊 Dataset Overview
#### **LifeSnaps (N=71)**
* 5,850 days of data; 228 days labeled
* **Time series**: hourly steps, bpm, calories, distance, temperature
* **Tabular**: home/office status, IPIP personality scores, BMI, age
* **Label**: 3-class subjective stress (below / avg / above average)
#### **PMData (N=16)**
* 2,406 days of data; 2,079 labeled
* **Time series**: similar wearable metrics
* **Tabular**: sleep quality, phone usage, injury events, goal achievement
* **Label**: binary (stressed / not stressed)
🔍 **Key Differences**:
* LifeSnaps has sparse, subjective labeling and personality-rich context—suited for **contrastive pretraining**.
* PMData is denser and more clinically aligned—ideal for **fine-tuning and evaluation**.
### 🧠 What Is Contrastive Loss?
Contrastive loss is a self-supervised objective that:
* Pulls **paired samples** (e.g., same-day time series and tabular data) closer in the embedding space
* Pushes **unpaired samples** (e.g., different days or users) further apart
This allows the model to learn **meaningful cross-modal representations** without stress labels. These representations are then used for downstream stress classification tasks.
### 🧪 Model Pipeline
* Dual encoders: one for time series, one for tabular data
* Pretraining: contrastive loss applied to match paired modalities
* Stress prediction: done via a classifier trained on the learned embeddings (either linear probe or fine-tuning)
---
## **Results**
1. **Stress Detection Performance (AUC)**
* **LifeSnaps**: 76.28% using the proposed contrastive multimodal model
* **PMData**: 77.85%, outperforming XGBoost, Random Forest, BYOL, and SimCLR baselines
2. **Feature Importance**
* **Tabular**: "step goal achieved" and BMI were top predictors in LifeSnaps
* **Time Series**: Steps and bpm were more informative than calories or distance
### **Conclusions and Limitations**
* **Conclusion**: Contrastive pretraining with multimodal data improves stress detection across datasets, especially in low-label regimes. The learned representations are generalizable and robust.
* **Limitations**: LifeSnaps includes subjective stress labels and sparse annotations, which may limit generalization. Cross-user adaptation and personalization were not directly evaluated. Real-world deployment may require further calibration across devices and users.
---
## **Why This Matters**
* Demonstrates a scalable path to **passive stress monitoring** using widely available wearables
* Supports **digital mental health** interventions by integrating behavioral and contextual cues
* Offers a blueprint for applying **self-supervised learning** in real-world, multimodal healthcare settings
* Bridges physiological signals with psychographic information for more holistic health modeling
---
## **Acknowledgment**
This research was supported by the **National Institutes of Health (NIH #R01DA059925)** and the **National Science Foundation (NSF #2047296)**.
The work was conducted at **Rice University**, in collaboration with **Plaid**.
Poster presented at \[Insert conference if known].