# **CS1460 Final Project**: **Reddit Depression Detection**
## **Project Overview**
This project reimplements the paper *[Detecting Symptoms of Depression on Reddit](https://dl.acm.org/doi/abs/10.1145/3578503.3583621)*, which aims to detect symptoms of depression in Reddit posts. The goal is to classify posts into symptom-related categories (e.g., `anger`, `anxiety`) or control using:
- **LDA**: Topic modeling to represent posts as distributions over topics.
- **RoBERTa**: Pre-trained transformer model for contextual embeddings of posts.
The implementation uses a single Jupyter Notebook, `Reddit Depression.ipynb`, where intermediate outputs (e.g., LDA models, embeddings) are saved as `.pkl` files for reuse to speed up processing.
## **Links**
Here is a [video walkthrough of the project](https://drive.google.com/file/d/1N8wUhM_pTwObcaX1DTS7vm2luf5ki3U9/view?usp=drive_link).
Here is a link to my [notebook](https://drive.google.com/file/d/11WhOMF_GSG2LZW7QTyUPKphX4zg5x-DH/view?usp=sharing).
## **Structure of the Notebook**:
- **Data Preprocessing**:
- Tokenization, mapping subreddits to symptoms, and stop-word removal.
- Output: `combined_df.pkl`.
- **LDA Model Training**:
- Trains LDA to represent posts as topic distributions.
- Output: `lda_model.model`, `df_with_topics.pkl`.
- **RoBERTa Embedding Generation**:
- Generates contextual embeddings for each post using RoBERTa.
- Output: `df_with_roberta.pkl`.
- **Evaluation**:
- Uses Random Forest and 5-fold cross-validation to evaluate both LDA and RoBERTa features.
- Output: `final_results.pkl`.
---
## **Outputs**
| Output File | Description |
|----------------------|----------------------------------------------|
| `combined_df.pkl` | Preprocessed data with symptoms mapped |
| `lda_model.model` | Trained LDA model |
| `df_with_topics.pkl` | Posts represented as LDA topic distributions |
| `df_with_roberta.pkl` | Posts represented as RoBERTa embeddings |
| `final_results.pkl` | ROC AUC scores for each symptom and method |
---
## **Results**
- **Performance**:
| Symptom | LDA | RoBERTa |
|---------------------|--------|---------|
| anger | 0.819849 | 0.915685 |
| anhedonia | 0.946179 | 0.956880 |
| anxiety | 0.884495 | 0.956304 |
| disordered eating | 0.915986 | 0.957470 |
| loneliness | 0.808022 | 0.908328 |
| sad mood | 0.788084 | 0.911753 |
| self-loathing | 0.864496 | 0.938799 |
| sleep problem | 0.933253 | 0.976046 |
| somatic complaint | 0.875075 | 0.934806 |
| worthlessness | 0.697055 | 0.902052 |
- **Observations**:
- RoBERTa embeddings generally outperform LDA for most symptoms.