# Stat 120 Concept Quiz Study Guide This is an outline of material for the concept quizzes in Stat 120 for Spring term 2024. To study, I recommend carefully going through your class notes, homework problems, daily prep, and daily activities. Be sure that you review actively, intermixing reading, thinking, solving problems, and asking questions. I have included bullet points of important topics to study, but *I do not guarantee that it is an exhaustive list*. There are also example quiz questions included, but this does not mean the quiz questions will of identical in structure. ## Quiz coverage Each quiz will include all of the topics that we have discussed up to that point in the course, but remember you get multiple attempts on each so only attempt those that you feel prepared for. Below is a table outlining the **new** topics that will be included on each quiz (but the old topics will have new problems as well if you want to attempt them). | Quiz | New concepts included | | :--------: | -------- | | 1 | D.1, D.2, D.3, D.4, <br> E.1, E.2, E.3, E.4, E.5, <br>R.1, R.2, R.3, R.4, R.5 | | 2 | HT.1, HT.2, HT.3, HT.4, HT.5, <br> CI.1, CI.2, CI.3, CI.4, CI.5, CI.6, CI.9 | | 3 | HT.6, HT.7, HT.8, HT.9, HT.10 <br> CI.7, CI.8| ## Formulas for quiz 3 On quiz three (and the following attempt sessions) you will have the following formulas provided: | Parameter | Distribution | SE (CI) | SE (Test) | | -------- | -------- | -------- | -------------- | | Proportion | Normal | $\sqrt{\dfrac{\widehat{p}(1-\widehat{p})}{n}}$ | $\sqrt{\dfrac{p_{0}(1-p_{0})}{n}}$ | | Difference in proportions | Normal | $\sqrt{\dfrac{\widehat{p}_1(1-\widehat{p}_1)}{n_1} + \dfrac{\widehat{p}_2(1-\widehat{p}_2)}{n_2}}$ | $H_0: p_1 - p_2 = 0$ <br> $\widehat{p} = \dfrac{n_1\widehat{p}_1 + n_2\widehat{p}_2}{n_1 + n_2}$ <br> $\sqrt{\dfrac{\widehat{p}(1-\widehat{p})}{n_1} + \dfrac{\widehat{p}(1-\widehat{p})}{n_2}}$ | | Mean | $t, \ df=n-1$ | $\dfrac{s}{\sqrt{n}}$ | $\dfrac{s}{\sqrt{n}}$ | | Difference in means | $t, \ df=\min(n_1,n_2)-1$ | $\sqrt{\dfrac{s_{1}^{2}}{n_{1}}+\dfrac{s_{2}^{2}}{n_{2}}}$ | $\sqrt{\dfrac{s_{1}^{2}}{n_{1}}+\dfrac{s_{2}^{2}}{n_{2}}}$ | | Paired difference in means | $t, \ df=n-1$ | $\dfrac{s_d}{\sqrt{n_d}}$ | $\dfrac{s_d}{\sqrt{n_d}}$| ## Useful links * [Specifications for successful work](https://moodle.carleton.edu/mod/folder/view.php?id=927642) - gives a brief overview of what I mean by major/minor errors. Sepcific specifications for each problem will be given on the quiz. * [Study guide solutions](https://hackmd.io/@aloy/rJd9Ur41A) - be sure to give a good-faith effort to each problem before looking at the solution. ## Data Collection ### D.1: (core) Given a research question or data description, I can identify - the case, - the variable(s) and their types (categorical or quantitative), and - the response and explanatory variable(s). #### Example question: The table below shows the first 8 observations from a sample of 200 individuals, who reported their age, race, income, and job satisfaction score (on a scale from 0 to 100). <img src="https://hackmd.io/_uploads/By-a8nK0a.png" width="60%"/> 1. What are the cases? 2. What are the variables recorded in this study? List them and identify each as either categorical or quantitative. ### D.2 (core) I can explain (both in general and in the context of a specific example) the difference between - an observational study and an experiment, and - the types of conclusions that can be drawn based on the study type #### Example question Below is a brief overview of an experiment published in *Archives of General Psychiatry*. Over a 4-month period, among 30 people with bipolar disorder, patients who were given a high dose (10 g/day) of omega-3 fats from fish oil improved more than those given a placebo. 1. Why didn't the experimenters just given everyone the omega-3 fats to see if they improved? 2. The experimenters randomly assigned patients to the two groups. Why was this important? 3. Can the experimenters generalize their results to all bipolar patients? 4. Can the experimenters claim that the omega-3 fats caused the improvement? ### D.3 (core) Given the description of a study, I can describe - how the sample of observational units was collected, - identify any potential biases in the sampling method, and - identify the scope of inference for any conclusions made. #### Example question In a large city school system with 20 elementary schools, the school board is considering the adoption of a new policy that would require elementary students to pass a test in order to be promoted to the next grade. The school board wants to find out whether parents agree with this plan and is considering using one of the following sampling schemes: 1. Put a big ad in the newspaper asking people to log their opinions on the district website. 2. Randomly select one of the elementary schools and contact every parent by phone. 3. Send a survey home with every student and ask parents to fill it out and return it the next day. 4. Randomly select 50 parents from each elementary school. Send them a survey and follow up with a phone call if they do not return the survey within a week. Which sampling scheme would you recommend to the school board? Justify your answer. ### D.4 (core) Given the description of a study, I can correctly identify the population, sample, parameter of interest (in words and statistical notation), and the corresponding statistic (in words, statistical notation, and numerical value) #### Example question The journal *Circulation* reported that among 1900 people who had heart attacks, those who drank an average of 19 cups of tea a week were 44% more likely than non-tea drinkers to survive at least 3 years after the heart attack. 1. What is the population of interest? 1. What is the sample? 1. What is the parameter of interest? 1. What is the statistic? ## Exploratory Data Analysis ### E.1 I can correctly interpret graphical displays and summary statistics of one categorical variable. - Be able to construct and read bar charts. - Be able to contruct and read frequency/relative frequency tables. - Be able to calculate and interpret sample proportion. #### Example question In a survey conducted by the Gallup organization September 6-9, 2012, 1,017 adults were asked "In general, how much trust and confidence do you have in the mass media - such as newspapers, TV, and radio - when it comes to reporting the news fully, accurately, and fairly?" 81 said that they had a "great deal" of confidence, 325 said they had a "fair amount" of confidence, 397 said they had "not very much" confidence, and 214 said they had "no confidence at all". 1. Display the results in a frequency table. 1. Sketch a bar chart of the data. 1. What proportion of respondents have a fair amount of confidence in the media? ### E.2 (core) I can correctly interpret graphical displays and summary statistics of a single quantitative variable. - Know how histograms and boxplots are constructed. - Be able to read histograms and boxplots. - Understand the measures of center and when each is preferred. - Understand the measures of spread and when each is preferred. - Be able to describe the shape of a distribution (modes/peaks, symmetry/skewness, unusual features) #### Example question The below histogram displays the distance (in miles) from a random sample of 500 New York taxi trips. The data come from the New York City Taxi and Limousine Commission's database of yellow-taxi trips. <img src="https://hackmd.io/_uploads/Sy7NoRKCp.png" width="60%"/> 1. Describe the distribution of trip distances, commenting on modes, symmetry, and unusual observations. 2. Is it better to report the mean and standard deviation or the median and IQR for this data set? Explain. ### E.3 I can correctly interpret graphical displays and summary statistics of two categorical variables. - Know how stacked bar charts are created. - Be able to read stacked bar charts. - Be able to calculate and compare - This includes calculating/identifying appropriate conditional percentages or proportions (which requires correct identification of response and explanatory - Be able to comment on if there appears to be an association between the two variables. #### Example question Intro statistics students conducted a survey as part of their final project. The questions asked included: - How would you rate yourself politically? - How would you describe your diet? Below is a stacked bar chart of the 289 responses. <img src="https://hackmd.io/_uploads/Bkw-hCYCT.png" width="80%"/> Describe what this plot is showing about the association between politics and diet. ### E.4 I can correctly interpret graphical displays and summary statistics of one categorical and one quantitative variable. - Compare and contrast shapes, centers (choosing the appropriate measure), and spreads (choosing the appropriate measure) for the quantitative variable across the levels of the categorical variable in *context*. - This also includes comment on if there appears to be an association between the two variables. #### Example question Below are boxplots displaying the relationship between vitamin use and the concentration of retinol (a micronutrient) in the blood for a sample of $n = 315$ individuals. Does there seem to be an association between these two variables? Briefly justify your answer. <img src="https://hackmd.io/_uploads/HJ9q2AtR6.png" width="80%"/> ### E.5 (core) I can correctly interpret graphical displays and summary statistics of two quantitative variables. - Be able to read a scatterplot. - Comment on the trend, shape, strength, and unusual features of the relationship between the two variables in *context*. #### Example question The below scatterplot displays data collected from a random sample of 500 New York taxi trips. The data come from the New York City Taxi and Limousine Commission's database of yellow-taxi trips, which contains a number of variables including distance (in miles) and total cost of the trip (in dollars). Describe the association between the distance and total cost of a taxi ride in New York. <img src="https://hackmd.io/_uploads/HJOba0YAp.png" width="60%"/> ## Simple Linear Regression ### R.1 Given R output, I can write the equation of the fitted linear regression model using proper statistical notation. - Know the difference between $y$ and $\widehat{y}$ and which one to use. - Know how to identify the slope and intercept on R output. - Correctly "assemble" the fited equation. #### Example problem Meadowfoam is a small plant found growing in moist meadows of the U.S. Pacific Northwest. Researchers reported the results from one study in a series designed to find out how to elevate meadowfoam production to a profitable crop. In a controlled growth chamber, they focused on the effects of two light-related factors: light intensity and the timing of the onset of the light treatment. Below are results from a quick regression analysis in R. ``` Call: lm(formula = Flowers ~ Intensity, data = meadowfoam) Coefficients: (Intercept) Intensity 77.38500 -0.04047 ``` Write down the equation of the fitted regression line using proper notation. ### R.2 (core) Given R output, I can appropriately interpret the coefficients of a simple linear regression model. - Be able to write a one-sentence interpretation of the y-intercept in context. - Think critically about whether the y-intercept makes sense in context. - Be able to write a one-sentence interpretation of the slope in context. #### Example problem Meadowfoam is a small plant found growing in moist meadows of the U.S. Pacific Northwest. Researchers reported the results from one study in a series designed to find out how to elevate meadowfoam production to a profitable crop. In a controlled growth chamber, they focused on the effects of two light-related factors: light intensity and the timing of the onset of the light treatment. Below are results from a quick regression analysis in R. ``` Call: lm(formula = Flowers ~ Intensity, data = meadowfoam) Coefficients: (Intercept) Intensity 77.38500 -0.04047 ``` 1. Write a one-sentence interpretation of the slope, in context. 2. Does the intercept make sense in the context of the problem? Explain briefly. ### R.3 Given R output, I can use the fitted-least squares regression to estimate the response value for a given value of the explanatory variable. #### Example problem Meadowfoam is a small plant found growing in moist meadows of the U.S. Pacific Northwest. Researchers reported the results from one study in a series designed to find out how to elevate meadowfoam production to a profitable crop. In a controlled growth chamber, they focused on the effects of two light-related factors: light intensity and the timing of the onset of the light treatment. Below are results from a quick regression analysis in R. ``` Call: lm(formula = Flowers ~ Intensity, data = meadowfoam) Coefficients: (Intercept) Intensity 77.38500 -0.04047 ``` Use the fitted regression line to predict the number of flowers per plant when light intensity is 450 $\mu$mol/$m^2$/sec. ### R.4 I understand when it is appropriate to use a simple linear regression model to make a prediction. - What should be true about the relationship between y and x? - Think about whether it is "safe" to make the prediction based on the value of x compared to the observed range of the predictor. #### Example problem A regression model was used to predict penguin heart rate as a function of duration of dive (in minutes). Below is a scatterplot of the observed data with the fitted regression line superimposed. <img src="https://hackmd.io/_uploads/ByyW0g5Ra.png" width="60%"/> Should you used this model to predict a penguin's heart rate for a 20 minute dive? Justify your answer using statistical reasoning. ### R.5 I can check whether a simple linear regression model is appropriate for the data at hand. - Know the conditions for simple linear regression to be appropriate (valid) and how to check them: - A linear relationship between x and y - Independent observations in the data set - The points should be normally distributed about the line at every value of x - Know how to read a residual plot and what condition(s) can be checked from it. - Know how to read a histogram of the residuals and what condition(s) can be checked from it. - Think critically about the data collection process. #### Example problem A regression model was used to predict penguin heart rate as a function of duration of dive (in minutes). Below is the residual plot and a histogram of the residuals. Does the regression model to adequately describe the association between heart rate and dive duration? Explain your reasoning using specific evidence from these plots. <img src="https://hackmd.io/_uploads/rygxae5CT.png" width="80%"/> ## Hypothesis Testing ### HT.1 I can compare and contrast the null distribution and the population distribution. - Where should the null distribution be centered? - Suppose that the null distribution were displayed as a dotplot. What would a single "dot" represent? - Where should the population distribution be centered? - Suppose that the population distribution were displayed as a dotplot. What would a single "dot" represent? #### Example problem Below are two histograms. One is a population distribution and the other is a null distribution for the test of the following hypotheses: $H_0: \mu = 85$ vs. $H_0: \mu > 85$. Which is which? Support your answer with statistical justification. ![image](https://hackmd.io/_uploads/HyIN_4iRT.png) ### HT.2 (core) Given a research question, I can state appropriate hypotheses using both words and correct statistical notation. - Be able to identify the parameters of interest for one- and two-sample problems. - Know what statistical notation is used for each parameter. - Know the notation used for the null and alternative hypotheses. - Use correct equalities/inequalities in the hypotheses. #### Example problem The Stanford University Heart Transplant Study was conducted to determine whether an experimental heart transplant program increased lifespan. Each patient entering the program was designated an official heart transplant candidate. Some patients got a transplant and some did not. Patients in the treatment group got a transplant and those in the control group did not. Of the 34 patients in the control group, 4 survived to the end of the study. Of the 69 people in the treatment group, 24 survived to the end of the study. Clearly state the hypotheses being tested. ### HT.3 (core) Given a set of hypotheses, test statistic, and randomization distribution, I can calculate a p-value. - Understand the link between the alternative hypothesis and the p-value calculation. - Know how to calculate a p-value as a proportion if you are given the randomization distribution. - Know the bounds of a p-value, and check your work. - Know the difference between a one- and two-sided p-value. #### Example problem The Stanford University Heart Transplant Study was conducted to determine whether an experimental heart transplant program increased lifespan. Each patient entering the program was designated an official heart transplant candidate. Some patients got a transplant and some did not. Patients in the treatment group got a transplant and those in the control group did not. In the study, the difference in the proportion of subjects surviving in the treatment and the control groups (treatment - control) is 0.23. Below is a randomization distribution (comprised of 250 simulated statistics) that can be used to conduct this hypothesis test. Calculate the p-value using this distribution. Be sure to show your work. <img src="https://hackmd.io/_uploads/ByJtxb9Ca.png" width="80%"/> ### HT.4 (core) Given a p-value, I can discern the plausibility of a specified hypothesis. - Be able to interpret the p-value as strength of evidence against the null hypothesis. - Be able to apply the decision rule if you are given the type I error rate, $\alpha$. #### Example problem The US Environmental Protection Agency has set the action level for lead contamination of drinking water at 15 ppb (parts per billion). Samples are regularly tested to ensure that mean lead contamination is not above this level. Suppose we are testing: $H_0: \mu = 15$ vs. $H_a: \mu > 15$ Suppose that researchers calculated a p-value of 0.05. What does this tell you about the strength of evidence relating to the hypotheses? ### HT.5 (core) Given the results of a hypothesis test, I can interpret the decision in context for a two-sample problem. - Given a p-value, be able to make a decision using either strength of evidence or the decision rule. - Provide a conclusion phrased in terms of an appropriate hypothesis that includes the p-value and the strength of evidence in *context*. - Be able to phrase conclusions in terms of parameters. #### Example problem The Stanford University Heart Transplant Study was conducted to determine whether an experimental heart transplant program increased lifespan. Each patient entering the program was designated an official heart transplant candidate. Some patients got a transplant and some did not. Patients in the treatment group got a transplant and those in the control group did not. In the study, the difference in the proportion of subjects surviving in the treatment and the control groups (treatment - control) is 0.23. Suppose that researchers calculated a p-value of 0.02. What does this tell you about the strength of evidence relating to the hypotheses? ### HT.6 Given a set of hypotheses, a problem statement, and data (sample mean and sample standard deviation), I can calculate the test statistic for a single population mean. - Know the formula for the (standardized) test statistic for a single mean. #### Example problem Every year, the United States Department of Health and Human Services releases to the public a large data set containing information on births recorded in the country. In this problem you will work with a random sample of 1,000 cases from the data set released in 2014. You might have heard that human gestation is typically 40 weeks; however, a friend mentions that recent increases in cesarean births is likely to have decreased length of gestation. Below are summary statistics of the gestation length from the random sample of 1,000 people. <img src="https://hackmd.io/_uploads/HyeoX5i0T.png" width="55%"/> Calculate the standardized test statistic for this situation. ### HT.7 Given a set of hypotheses, a problem statement, and data, I can calculate the test statistic for a single population proportion. - Know the formula for the (standardized) test statistic for a single proportion. #### Example problem An insurance company checks records on 582 accidents selected at random and notes that teenagers were at the wheel in 91 of them. Calculate the standardized test statistic that can be used to determine whether less than 20% of auto accidents involve teenage drivers. ### HT.8 Given a test statistic, I can describe how to find the p-value for a hypothesis test of a single population proportion. - Know what probability distribution the test statistic for a single proportion follows. - Be able to sketch and clearly label a picture indicating what areas under the curve correspond to the p-value. #### Example problem Suppose that you wish to test $H_0 : p = 0.2$ vs $H_a : p \ne 0.2$ using the sample results from a random sample of size $n = 1000$. You have calculated the test statistic $z=4.74$. Assume that all conditions necessary for inference are met. Describe how to find the p-value for this hypothesis test. ### HT.9 Given a test statistic, I can describe how to find the p-value for a hypothesis test of a single population mean. - Know what probability distribution the test statistic for a single mean follows. - Know how to calculate degrees of freedom. - Be able to sketch and clearly label a picture indicating what areas under the curve correspond to the p-value. #### Example problem A random sample of 48 students at a large university reported getting an average of 7 hours of sleep on weeknights, with standard deviation 1.62 hours. Assuming that all conditions for inference are met, describe how you can calculate the p-value for the following hypotheses if the test statistic is $t=-4.28$. $H_0: \mu = 8$ vs. $H_a: \mu < 8$. ### HT.10 (core) Given the results of a hypothesis test, I can interpret the decision in context of the problem for a one-sample problem. - Given a p-value, be able to make a decision using either strength of evidence or the decision rule. - Provide a conclusion phrased in terms of an appropriate hypothesis that includes the p-value and the strength of evidence in *context*. - Be able to phrase conclusions in terms of the parameter. #### Example problem The US Environmental Protection Agency has set the action level for lead contamination of drinking water at 15 ppb (parts per billion). Samples are regularly tested to ensure that mean lead contamination is not above this level. Suppose we are testing: $H_0: \mu = 15$ vs. $H_a: \mu > 15$ Suppose that researchers calculated a p-value of 0.02. State conclusion to this hypothesis test in the context of the problem. ## Confidence Intervals ### CI.1 I can compare and contrast the sampling distribution and the population distribution. - Where should the sampling distribution be centered? - Suppose that the sampling distribution were displayed as a dotplot. What would a single "dot" represent? - Where should the population distribution be centered? - Suppose that the population distribution were displayed as a dotplot. What would a single "dot" represent? #### Example problem In the 2016 Olympic Men's Marathon, 140 athletes finished the race. Below are summary statistics of those finish times (in minutes). |n | mean | median | standard deviation| |--|----|-----|----| |140|142.367|140.765|7.723| Suppose that you draw a random sample of size $n=15$ from these 140 marathon times. Describe how the sampling distribution of the sample mean from a random sample of size $n=15$ compares to the distribution of all marathon times. Be sure to comment on the shape, center, and spread. ### CI.2 (core) Given a bootstrap distribution, I can calculate a confidence interval using either the plug-in or percentile method. - Be able to apply the forumla for the plug-in method for a 95% confidence interval $\text{statistic} \pm 2 \cdot \text{SE}$. - Know when the plug-in method is appropriate (in case you need to choose the method). - Given percentiles from a bootstrap distribution, know how to find a percentile bootstrap confidence interval. #### Example problem Lead in groundwater poses a serious public-health problem. A study conducted in Minnesota measured the water quality of 895 randomly selected wells. One of the contaminants measured was lead concentration (in ppb). Percentiles of the bootstrap distribution for the sample mean are provided. Find a 97% confidence interval for the mean lead concentration in Minnesota wells. <img src="https://hackmd.io/_uploads/S1P9-ZcR6.png" width="90%"/> ### CI.3 (core) Given a confidence interval, I can interpret it in the context for a one-sample problem. - Be able to write a one-sentence interpretation. - Use appropriate terminology for a confidence interval, wording matters! - Appropriately communicate what the parameter is, in context. #### Example problem Lead in groundwater poses a serious public-health problem. A study conducted in Minnesota measured the water quality of 895 randomly selected wells. One of the contaminants measured was lead concentration (in ppb). A statistician calculated an 89% confidence interval to be (0.82, 1.8). Interpret this confidence interval in the context of the problem. ### CI.4 (core) Given a confidence interval, I can interpret it in the context for a two-sample problem. - Be able to write a one-sentence interpretation. - Use appropriate terminology for a confidence interval, wording matters! - Appropriately communicate what the parameter is, in context. #### Example problem A group of researchers who are interested in the possible effects of distracting stimuli during eating, such as an increase or decrease in the amount of food consumption, monitored food intake for a group of 44 patients who were randomized into two equal groups. The treatment group ate lunch while playing solitaire, and the control group ate lunch without any added distractions. Patients in the treatment group ate 52.1 grams of biscuits, with a standard deviation of 45.1 grams, and patients in the control group ate 27.1 grams of biscuits, with a standard deviation of 26.4 grams. The researchers found a 95% confidence interval for the difference in mean biscuit consumption to be (6.41, 43.59). Interpret this confidence interval in the context of the problem. ### CI.5 I can describe the concept of confidence. - What is our confidence in, the observed interval or the process used to create the interval? #### Example problem According to a survey by the UCLA Higher Education Institute,26 69 percent of the first year college students in the sample reported feeling homesick. A 95% confidence interval for this proportion is given by (67%, 71%). Explain to a friend who has not taken Stat 120 why we use the words "confidence" or "sure" rather than "probability" or "chance" when interpretting this confidence interval. ### CI.6 (core) I can use a confidence interval to discern the plausibility of a specific claim. - What does a confidence interval communicate about the parameter? - Are values inside the interval plausible? - Are values outside the interval plausible? - How does the confidence level relate to the type I error rate if used to test a specific claim? #### Example problem An insurance company checks records on 582 accidents selected at random and calculated a 95% confidence interval for the proportion of all auto accidents that involve teenage drivers to be (0.128, 0.189). A politician urging tighter restrictions on drivers' licenses issued to teens says, "In one of every five auto accidents, a teenager is behind the wheel." Do the insurance company's findings support or contradict this statement? Explain. ### CI.7 I can calculate a theory-based confidence interval for a population proportion. - Know the formula for a normal-based confidence interval for a population proportion and how to apply it. - Know how to find the critical value, $z^*$, from a standard normal distribution. (You should be able to read R output, or clearly communicate what $z^*$ is.) #### Example problem An insurance company checks records on 582 accidents selected at random and notes that teenagers were at the wheel in 91 of them. Calculate a 92% confidence interval for the proportion of all auto accidents that involve teenage drivers. (Plug-in completely, but do not simplify.) ```{r} qnorm(.90) = 1.281552 qnorm(.92) = 1.405072 qnorm(.95) = 1.644854 qnorm(.96) = 1.750686 qnorm(.97) = 1.880794 qnorm(.98) = 2.053749 qnorm(.99) = 2.326348 ``` ### CI.8 I can calculate a theory-based confidence interval for the population mean. - Know the formula for a t-based confidence interval for a population mean and how to apply it. - Know how to calculate degrees of freedom. - Know how to find the critical value, $t^*$, from a t-distribution. (You should be able to read R output, or clearly communicate what $t^*$ is.) #### Example problem Lead in groundwater poses a serious public-health problem. A study conducted in Minnesota measured the water quality of 895 randomly selected wells. One of the contaminants measured was lead concentration (in ppb). Below are summary statistics from this random sample calculated via `favstats()`. ``` min Q1 median Q3 max mean sd n missing 0.02 0.09 0.23 0.61 210.29 1.268246 9.253287 895 0 ``` Calculate a 95% confidence interval for the mean lead concentration in Minnesota wells. (Plug-in completely, but do not simplify.) ### CI.9: I can describe the impacts of changing the sample size and confidence level on the width of a confidence interval. - Know the formulas for confidence intervals. - Know how the sample size impacts the standard error. - Know how the confidence level impacts the critical value ($z^*$ or $t^*$) or what quantiles are needed from a percentile bootstrap interval. #### Example problem Suppose that we have calculated a 99% confidence interval for the proportion of University of Minnesota students graduating from a Minnesota high school. If we plan to construct a second 99% confidence interval based on a new sample, how can we reduce the margin of error?