# SQL MIMIC Homework #6 DATA 1050 Fall 2020 **Due 23:59 EST, Wednesday, November 4, 2020** ## Setup - Use this GitHub Assignment Invitation Link: https://classroom.github.com/a/NpJgLMdL - Push to master as much as you want before the homework deadline. - Submit your repo to Gradescope. - If your final push will be late, remember to use the [Late Day Form](https://forms.gle/NP78Znw8gFTuRKbc7) *twice*, once before the deadline, and once again after your final push. ## Getting Started Please read the first steps section of the [SQL with MIMIC guide](https://hackmd.io/@data1050-staff-fa20/SkZx9d6mw) for information on setting up postgres. Once you are connected to the course database, you will be able to explore the dataset and test out your queries. Additional Resources * Going through and understanding the remainder of the [SQL with MIMIC guide](https://hackmd.io/@data1050-staff-fa20/SkZx9d6mw) guide will be very helpful for this assignment! * [MIMIC Documentation](https://mimic.physionet.org/about/mimic/) * [MIMIC Schema](https://mit-lcp.github.io/mimic-schema-spy/relationships.html) In classical statistics, having a representative sample is critical to whether or not inferences about a given population are correct. In Data Science, one is often presented with a dataset and asked to make useful inferences. In order to do this one it is important to understand the how the dataset was generated, which in turn leads to identification of the underlying population. As you work on the problems below, think about what population(s) the MIMIC dataset represents or can effectively proxy. ## Important Note on Patient Age > Patients who are older than 89 years old at any time in the database have had their date of birth shifted to obscure their age and comply with HIPAA. The shift process was as follows: the patient’s age at their first admission was determined. The date of birth was then set to exactly 300 years before their first admission. -[MIMIC documentation](https://mimic.physionet.org/mimictables/patients/) This transformation will bias statistics that rely on date of birth. ## Grading In this assignment, we will be grading your queries not only on the correctness of their output, but also your demonstration of understanding. **Please briefly explain each query in a comment within the `.sql` files so it's easier for us to give partial credit!** ## Part A (55 points) **hw6.1** DRY (9 points) Explain some advantages of storing data in multiple related tables over using only one table? Please include at least one specific example. > Save your answer in `report.md` **hw6.2:** Schema Understanding (6 points) Which tables in the MIMIC database are relevant if we're looking at general patient information, their admissions into the hospital, and information about their stays in the ICU (Intensive Care Unit)? > Save your answer in `report.md` **hw6.3:** (8 points) Write a SQL query that returns the subject_id and time of admission for all patients, sorted by recency of admission (most recent to least recent). > Save your solution in `hw6_3.sql` **hw6.4:** (8 points) Write a SQL query that returns the number of admissions with a recorded death time. > Save your solution in `hw6_4.sql` **hw6.5:** (8 points) Write a SQL query that returns the subject_id and age at death for each patient, sorted by age at death (youngest to oldest). > Save your solution in `hw6_5.sql` **hw6.6:** (8 points) Write a SQL query that returns the number of male vs. female patients that stayed in an ICU in the following format | gender | count | | :-------- | :---- | | M | | | F | | > Save your solution in `hw6_6.sql` **hw6.7:** (8 points) Write a SQL query that returns the three most common diagnoses, along with the number of admissions for each of these diagnoses. > Save your solution in `hw6_7.sql` ## Part B (45 points) **hw6.8:** (15 points) Write a SQL query that returns the subject_id and age of all the patients on their first admission, ordered from youngest to oldest. > Save your solution in `hw6_8.sql` **hw6.9:** (15 points) Write a SQL query that returns the number of patients that died while staying in a ICU. > Save your solution in `hw6_9.sql` **hw6.10:** (15 points) Histograms are a very useful data summarization technique. Write a SQL query that returns a summary table for ICU stays of the following format | los_group | males | females | | :-------- | :---- | :------ | | short | | | | medium | | | | long | | | where los stands for length of stay in the ICU, with stays less than 2 days classified as short, between 2 and 7 days classified as medium, and more than 7 days classified as long. The males and females columns are the gender breakdowns of the patients belonging to each group. *Hint:* The `CASE` keyword might be helpful here. > Save your solution in `hw6_10.sql`