# Classroom Lab #1 --- ### 1. For each variable determine if it is categorical or quantitative. | Variable | Categorical/Quantitative | | -------- | -------- | | Gender | Categorical| | Age| Quantitative | | Race| Categorical| | Height (cm) | Quantitative| | Weight (kg) | Quantitative| | Waist (cm) | Quantitative| --- ### 2. Hypothesize why some of the data is missing. There are a few possibilties for why the data is missing in the given data set. Althought we cannot be certain for any of these reasons we do know that of the variables missing the data ommitted is almost always either *Height (cm)* or *Waist (cm)*. The most probable answer for the ommission of the waist and height size in younger & elderly individuals is because the data can be less relevant or reliable due to their rapid growth and development stages. Similarly it can be extremely challenging to measure this data with younger & elderly individuals. Similarly, some participants might be uncomfortable or unable to comply with the measurement process. If they are subconciously upset at their specific height or waist size that data might have been omitted based on personal preference and not wanting to have that measured at the time of taking the data. --- ### 3. For each of the categorical variables, create both a bar graph and a pie chart. Which one looks better? Why? For each categorical variable, include the image of the chart you think looked better. #### Gender *Pie Chart:* ![image](https://hackmd.io/_uploads/rJCB04r5a.png)![image](https://hackmd.io/_uploads/B1dS0NB9a.png) *Bar Chart:* ![image](https://hackmd.io/_uploads/S1lHCEHqa.png) The pie chart when it comes to Gender clearly more accurately displays this data and the split between Male and Female participants. #### Race ![image](https://hackmd.io/_uploads/rJfVRNHqp.png)![image](https://hackmd.io/_uploads/BykiehCYp.png) --- ### 4. Make a histogram of the age of the sample. #### a. Include a copy of your histogram that StatCrunch produced. This study has oversampled a certain age group. This means they intentionally sought out more people in one age group than they did in the rest of the study. Which age group seems oversampled? How do you know? How would you describe the shape of the histogram? ![image](https://hackmd.io/_uploads/H11SlWlqT.png) It is quite apparent that individuals between the ages of 0-20 were significantly oversampled compared to that of all other ages who had roughly 400 participants sampled per group. Whereas the former category of 0-20 was between 700-1100 per age group with the participant count getting higher the younger the individual. From the visual representation, it appears that the data is right-skewed. #### b. Go to the “Options” button in the upper left side and choose “Edit.” Under “Bins:” you’ll see the word “Width” and a blank box. Change the width to 10 and press “Compute!” Then change the bin width to 1 and press compute. Include both histograms in your write up. ![image](https://hackmd.io/_uploads/Bkwl6Ig9T.png) --- ![image](https://hackmd.io/_uploads/HktfT8g5p.png) #### c. If you click on one of the bars in one of the histogram (it doesn’t matter which one you choose), it will turn pink. Choose one bar and explain which specific bin you chose. Now look at the data on the main page. Some of it is pink. In one sentence, explain which subjects are highlighted pink. Include a picture of your histogram. To clear the pink from the data and the histogram, note at the bottom left on the main page you’ll see a count of the number of rows highlighted. If you press “Clear,” you will clear the pink from all parts and windows. ![image](https://hackmd.io/_uploads/Hkc_aIl9T.png) The the aforementioned graph, the selected data is from the age ranges 0-10 years old. #### d. What age group has the most people in it when you look at the bin width of 1? Why is that? There is one correct answer for why and you have all the information to figure it out. ![image](https://hackmd.io/_uploads/ByD-C8xcT.png) The age group that has the most significant frequency of data in the survey is 80 year olds. That most likely reason is because individuals of older age are usually retired and have more free time to participate in something like this. Similarly since NHANES is focused on health data, there could be an increased emphasis on older adults– who are more likely to have various health conditions and nutritional needs. Because of this discrepency, the survey might intentionally include more individuals in this age group to gather data on health conditions prevalent among people in that age range like 80 year olds. --- ### 5. Now create a histogram of height of the participants. #### a. Experiment with the bin widths until you find the one that gives you the “best” shape. Explain why you think your choice of bin widths is the best. Include your best histogram in your write up ![image](https://hackmd.io/_uploads/BkhhyDl9T.png) I found that a bin value of six was the best to provide both accuracy of the representative of the data, and also kept the histogram from becoming overwhelming with different dar categories. #### b. How would you describe the shape? Does it appear to be skewed left or right? Does it appear to be unimodal or bimodal? The data in the most recent graph is indeed also skewed slight left, and is unimodal in nature. #### c. Why does this graph have this shape? There is a correct answer and you have all the information to figure it out! It has this shape because the most commonly sampled height between participants of all gender and age was between 150-180cm. Because of the standard distribution of height in the US, this range is indeed the most common height range and explains the height in the graph accurately. #### d. Go back to your histogram and choose “Options” on the upper left corner and Edit. Under “Group by” select Gender. Now you get two histograms – you can switch between the two by clicking on the arrows at the bottom. What are they? How are they the same and how are they different? Include both graphs in your write up. **Female:** ![image](https://hackmd.io/_uploads/BkMSZvxc6.png) --- **Male:** ![image](https://hackmd.io/_uploads/B1YIbPg56.png) These graphs are now sorted based on gender as assigned during data collection. They are similar in their unimodal distribution of the data, but different in the averages of height between genders, with males indeed having a higher average. ---