This project reimplements the paper Detecting Symptoms of Depression on Reddit, which aims to detect symptoms of depression in Reddit posts. The goal is to classify posts into symptom-related categories (e.g., anger
, anxiety
) or control using:
The implementation uses a single Jupyter Notebook, Reddit Depression.ipynb
, where intermediate outputs (e.g., LDA models, embeddings) are saved as .pkl
files for reuse to speed up processing.
Here is a video walkthrough of the project.
Here is a link to my notebook.
combined_df.pkl
.lda_model.model
, df_with_topics.pkl
.df_with_roberta.pkl
.final_results.pkl
.Output File | Description |
---|---|
combined_df.pkl |
Preprocessed data with symptoms mapped |
lda_model.model |
Trained LDA model |
df_with_topics.pkl |
Posts represented as LDA topic distributions |
df_with_roberta.pkl |
Posts represented as RoBERTa embeddings |
final_results.pkl |
ROC AUC scores for each symptom and method |
Performance:
Symptom | LDA | RoBERTa |
---|---|---|
anger | 0.819849 | 0.915685 |
anhedonia | 0.946179 | 0.956880 |
anxiety | 0.884495 | 0.956304 |
disordered eating | 0.915986 | 0.957470 |
loneliness | 0.808022 | 0.908328 |
sad mood | 0.788084 | 0.911753 |
self-loathing | 0.864496 | 0.938799 |
sleep problem | 0.933253 | 0.976046 |
somatic complaint | 0.875075 | 0.934806 |
worthlessness | 0.697055 | 0.902052 |
Observations: