# **Contrastive Pretraining for Stress Detection with Multimodal Wearable Sensor Data and Surveys** **Authors:** Zeyu Yang, Han Yu, Akane Sano **Affiliations:** Rice University, Department of Electrical and Computer Engineering; Plaid --- ![Contrastive-Pretraining for-StressDetection](https://hackmd.io/_uploads/HkKUzbGLeg.jpg) ## **Understanding the Research** This study explores a novel approach to detecting psychological stress by combining **wearable time series data** (e.g., heart rate, step count) with **tabular survey data** (e.g., personality traits, sleep quality). The research leverages a **contrastive self-supervised learning** framework to align the two modalities without requiring labeled stress data during pretraining. The method was evaluated on two real-world datasets—**LifeSnaps** and **PMData**—to show how multimodal fusion and pretraining can improve stress detection, especially in data-scarce scenarios. --- ## **Motivation** Detecting stress in everyday life is a critical yet challenging task due to: * The high cost and burden of collecting labeled stress data * The isolated use of either physiological or contextual features in past models * The lack of robust methods for multimodal representation learning in wearables This research aims to address these challenges by combining multiple data types and utilizing unlabeled data through contrastive learning. --- ## **Core Idea** The authors hypothesize that a **contrastive pretraining strategy**—which learns to align time series and tabular data representations—can provide a shared embedding space for stress detection. This improves performance even when labeled stress data is limited, by enabling more meaningful feature extraction from wearable and contextual signals. --- ## **Methods** ### 🔢 What Is Time Series vs. Tabular Data? * **Time Series Data**: High-frequency sensor streams sampled hourly—e.g., steps, heart rate, calories. Captures dynamic behavior. * **Tabular Data**: Low-frequency or static information such as age, personality scores, location labels, or sleep quality. Reflects context and individual traits. ### 📊 Dataset Overview #### **LifeSnaps (N=71)** * 5,850 days of data; 228 days labeled * **Time series**: hourly steps, bpm, calories, distance, temperature * **Tabular**: home/office status, IPIP personality scores, BMI, age * **Label**: 3-class subjective stress (below / avg / above average) #### **PMData (N=16)** * 2,406 days of data; 2,079 labeled * **Time series**: similar wearable metrics * **Tabular**: sleep quality, phone usage, injury events, goal achievement * **Label**: binary (stressed / not stressed) 🔍 **Key Differences**: * LifeSnaps has sparse, subjective labeling and personality-rich context—suited for **contrastive pretraining**. * PMData is denser and more clinically aligned—ideal for **fine-tuning and evaluation**. ### 🧠 What Is Contrastive Loss? Contrastive loss is a self-supervised objective that: * Pulls **paired samples** (e.g., same-day time series and tabular data) closer in the embedding space * Pushes **unpaired samples** (e.g., different days or users) further apart This allows the model to learn **meaningful cross-modal representations** without stress labels. These representations are then used for downstream stress classification tasks. ### 🧪 Model Pipeline * Dual encoders: one for time series, one for tabular data * Pretraining: contrastive loss applied to match paired modalities * Stress prediction: done via a classifier trained on the learned embeddings (either linear probe or fine-tuning) --- ## **Results** 1. **Stress Detection Performance (AUC)** * **LifeSnaps**: 76.28% using the proposed contrastive multimodal model * **PMData**: 77.85%, outperforming XGBoost, Random Forest, BYOL, and SimCLR baselines 2. **Feature Importance** * **Tabular**: "step goal achieved" and BMI were top predictors in LifeSnaps * **Time Series**: Steps and bpm were more informative than calories or distance ### **Conclusions and Limitations** * **Conclusion**: Contrastive pretraining with multimodal data improves stress detection across datasets, especially in low-label regimes. The learned representations are generalizable and robust. * **Limitations**: LifeSnaps includes subjective stress labels and sparse annotations, which may limit generalization. Cross-user adaptation and personalization were not directly evaluated. Real-world deployment may require further calibration across devices and users. --- ## **Why This Matters** * Demonstrates a scalable path to **passive stress monitoring** using widely available wearables * Supports **digital mental health** interventions by integrating behavioral and contextual cues * Offers a blueprint for applying **self-supervised learning** in real-world, multimodal healthcare settings * Bridges physiological signals with psychographic information for more holistic health modeling --- ## **Acknowledgment** This research was supported by the **National Institutes of Health (NIH #R01DA059925)** and the **National Science Foundation (NSF #2047296)**. The work was conducted at **Rice University**, in collaboration with **Plaid**. Poster presented at \[Insert conference if known].