# **CS1460 Final Project**: **Reddit Depression Detection** ## **Project Overview** This project reimplements the paper *[Detecting Symptoms of Depression on Reddit](https://dl.acm.org/doi/abs/10.1145/3578503.3583621)*, which aims to detect symptoms of depression in Reddit posts. The goal is to classify posts into symptom-related categories (e.g., `anger`, `anxiety`) or control using: - **LDA**: Topic modeling to represent posts as distributions over topics. - **RoBERTa**: Pre-trained transformer model for contextual embeddings of posts. The implementation uses a single Jupyter Notebook, `Reddit Depression.ipynb`, where intermediate outputs (e.g., LDA models, embeddings) are saved as `.pkl` files for reuse to speed up processing. ## **Links** Here is a [video walkthrough of the project](https://drive.google.com/file/d/1N8wUhM_pTwObcaX1DTS7vm2luf5ki3U9/view?usp=drive_link). Here is a link to my [notebook](https://drive.google.com/file/d/11WhOMF_GSG2LZW7QTyUPKphX4zg5x-DH/view?usp=sharing). ## **Structure of the Notebook**: - **Data Preprocessing**: - Tokenization, mapping subreddits to symptoms, and stop-word removal. - Output: `combined_df.pkl`. - **LDA Model Training**: - Trains LDA to represent posts as topic distributions. - Output: `lda_model.model`, `df_with_topics.pkl`. - **RoBERTa Embedding Generation**: - Generates contextual embeddings for each post using RoBERTa. - Output: `df_with_roberta.pkl`. - **Evaluation**: - Uses Random Forest and 5-fold cross-validation to evaluate both LDA and RoBERTa features. - Output: `final_results.pkl`. --- ## **Outputs** | Output File | Description | |----------------------|----------------------------------------------| | `combined_df.pkl` | Preprocessed data with symptoms mapped | | `lda_model.model` | Trained LDA model | | `df_with_topics.pkl` | Posts represented as LDA topic distributions | | `df_with_roberta.pkl` | Posts represented as RoBERTa embeddings | | `final_results.pkl` | ROC AUC scores for each symptom and method | --- ## **Results** - **Performance**: | Symptom | LDA | RoBERTa | |---------------------|--------|---------| | anger | 0.819849 | 0.915685 | | anhedonia | 0.946179 | 0.956880 | | anxiety | 0.884495 | 0.956304 | | disordered eating | 0.915986 | 0.957470 | | loneliness | 0.808022 | 0.908328 | | sad mood | 0.788084 | 0.911753 | | self-loathing | 0.864496 | 0.938799 | | sleep problem | 0.933253 | 0.976046 | | somatic complaint | 0.875075 | 0.934806 | | worthlessness | 0.697055 | 0.902052 | - **Observations**: - RoBERTa embeddings generally outperform LDA for most symptoms.