--- tags: math615 robots: noindex, nofollow --- # Choosing Appropriate Analysis :busts_in_silhouette: For each situation below work with a partner to identify 1. the dependent and independent variables 2. an appropriate analysis 3. an appropriate visualization Use the table in PMA6 as a guide. ::: info **PMA6 #6.2** An investigator is attempting to determine the health effects on families of living in crowded urban apartments. Several characteristics of the apartment have been measured, including square feet of living area per person, cleanliness, and age of the apartment. Several illness characteristics for the families have also been measured, such as number of infectious diseases and number of bed days per month for each child, and overall health rating for the mother. (XXXX) ::: ::: info **PMA6 #6.3** A coach has made numerous measurements on successful basketball players, such as height, weight, and strength. He also knows which position each player is successful at. He would like to obtain a function from these data that would predict which position a new player would be best at. (Jake Wallin, Miriam Espinoza, Raquel) Dependent variable: Basketball Position Independent variable: Height, weight, strength stats Appropriate Analysis: Discriminate Function, Logistic Regression, Poisson Regression Appropriate visualization: bar graph ::: ::: info **PMA6 #6.4** A college admissions committee wishes to predict which prospctive students will successfully graduate. To do so, the committee intends to obtain the college grade point averages for a sample of college seniors and compare these with their high school grade point averages and Scholastic Aptitude Test scores. (Matt & Abbey) **Dependent variable**: Successful graduation **Independent variables**: College GPA, high school GPA, and SAT scores. **Appropriate analysis**: Discriminate function, logistic regression **Appropriate visualization**: Scatterplot, Histogram ::: ::: info **PMA6 #6.5** Data on men and women who have died have been obtained from health maintenance organization records. These data include age at death, height and weight, and several physiological and lifestyle measurements such as blood pressure, smoking status, dietary intake, and usual amount of exercise. The immediate and underlying causes of death are also available. From these data we would like to find out which variables predict death due to various underlying causes. (This procedure is known as risk factor analysis.) Suggest possible analyses. (XXXX) ::: ::: info **PMA6 #6.6** Large amounts of data are available from the United Nations and other international organizations such as the World Bank on each country and sovereign state of the world, including health, education, and commercial data. Using this data we wish to relate health data such as infant mortality (the proportion of children dying before the age of one year) to other data such as gross national product per capita, percentage of people older than 15 who can read and write (literacy), average daily caloric intake per capita, and number of persons per practicing physician. (Sean and Gunner) 1. Independent Variable - Gross national product per capita, percentage older than 15 who can read/write, avg daily caloric intake, practicing physicians per capita Data type: nominal Dependent Variable - Infant mortality ("health data") Data type: Nominal 2. Appropriate analysis - poisson 4. Appropriate visualization - scatterplot ::: ::: info **PMA6 #6.8** For the data described in Problem 6.6 we wish to relate health data such as infant mortality (the proportion of children dying before the age of one year) and life expectancy (the expected age at death of a person born today if the death rates remain unchanged) to other data such as gross national product per capita, percentage of people older than 15 who can read and write (literacy), average daily caloric intake per capita, average energy consumption per year per capita, and number of persons per practicing physician. Suggest possible analyses. What other variables would you include in your analysis? (Evan/Kenji) ::: Explanatory variables: GDP, literacy rate, annual energy consumption, number of persons per physician. Response variables: infant mortality rate, life expectancy Suggested analysis: PCA (principal component analysis) Suggested visualization: scatterplot of two main principal components ::: info **PMA6 #6.9** A member of the admissions committee notices that there are several women with high grade point averages but low SAT scores. He wonders if this pattern holds for both men and women in general, only for women in general, or only in a few cases. (Sarah and Meghan) 1. Independent variable: Gender Dependent variables: Ratio between SAT score and GPA. 2. Appropriate analysis: analysis of variance 3. Appropriate visualization: Since we have a quantitative and categorical variable, we could use side by side violin plots. ::: ::: info **PMA6 #6.11** A psychologist would like to predict whether or not a respondent in the depression study described in Chapter 3 is depressed. To do this, she would like to use the information contained in the following variables: MARITAL, INCOME, and AGE. Suggest analyses. (Eden and Ryan) ::: 1. The independent variables are marital status(binary), income (continuous), and age(continuous). The binary dependent variable is whether or not the respondent is depressed. 2. The analysis we will use is logistic regression. Because we are classifying our respondents into a binary response(depressed / not depressed), logistic regression fits. Additionally, we could use poisson regression or a log-linear model. 3.