HIN 776: Final Longitudinal Project Part 2

> Welcome back! Now that we've been through some of the fundamentals of python data anyalytics and visualization with pandas, matplotlib, and seaborn, we can dive deeper into the libraries and find out more. ## Review from last week First, there are two *cheat sheets* that will prove useful this week. Check them out as well before proceeding. * [Matplotlib Cheatsheet](https://matplotlib.org/cheatsheets/) * [Pandas Cheatsheet](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf) Then, complete the following actions (review last week's tutorial if you need assistance) 1. ==Load the necessary libraries to get started== 2. ==Load the data into a pandas dataframe== As soon as you load the data from the file, it reverts back to its original shape (before we made changes like fixing the mixing values). You're going to have to also complete the following steps: 1. ==Find a way to retrieve the data that you had saved at the end of last week and load it in as a dataframe== 2. ==Drop the column named `Unnamed: 0` and make sure the change persists with `inplace=True`== 3. ==Make a piechart of different values of the column `cp` with appropriate labels and what percentage of area is covered by each slice== Now, let's jump back in with... ## Step 5: Normalization We're going to start working with interpreting the data based on normal values. This is most clear with `trestbps` (resting blood pressure). From the data dictionary, we see that having a `trestpbs > 130-140` is typically a cause for concern. Let's see if we can highlight that data specifically. ==Make a new column called `normalized trestbps` and add it to the dataframe. The column must contain normalized values of the `trestbps` column== > **Note**: Normalization cannot be applied to categorical data - only on quantitative data ==`plot()` a scatterplot that compares `trestpbs` valued over 130 with `age` that indicates the `target` value as `lightgreen` for *yes* and as `red` for *no*== This will give you a picture of the spread of trestpbs normalized for age. ### Usualness One of the benefits of this is that we can now judge usual vs unusual based on data. ==Make a new coluimn in the dataframe that is labeled as `usuality` or `usualness` wher ethe value will either be `usual` or `unusual` based on the following conditions:== * `trestbps` => `130` * `target` = `yes` * `thalach` < `220`-`age` ( Check out [this](https://www.medicinenet.com/highest_heart_rate_you_can_have_without_dying/article.htm)) If all the above conditions are true for a record, then the `usuality` will be `usual` and if any of the conditions are false then it will be `unusual`. > Note: Check the data dictionary and read in detail all the different attributes and what their value signifies ## Putting it all together Now we have usuality as a data point, we can create a visualization that compares `target` and `usuality`/`usualness` (depending on what you named it) ==Make a pandas `crosstab` with the `target` and the `usuality`/`usualness` attributes== then ==`plot()` a bar chart of the results of the crosstab from the previous step (including appropriate labels)== Then, save your whole notebook (both week 1 and 2) as a `.pdf` and submit it to Brightspace