---
title: "Case Study: Paper Helicopters"
author: 'Brian Dailey, Zihan Zhang, Yang Gao'
date: 'November 13, 2019'
output:
pdf_document:
toc: yes
toc_depth: '2'
---
```{r Front Matter, include=FALSE}
# clean up & set default chunk options
rm(list = ls())
knitr::opts_chunk$set(echo = FALSE)
# packages
library(readr)
library(dplyr)
library(pastecs)
library(knitr)
library(mosaic)
# user-defined functions
# inputs
heli = read.delim("helicopter_data.txt", header = TRUE)
```
# Project Description
A designed experimental study was conducted to test different factors that could potentially affect the in-air duration (response variable: time) of the paper helicopters that are designed and made by study groups in STAT470W.
The design of the study is based on the factorial experiment generated in Minitab and the budget of this study was 40 paper helicopters with six control variables (wing length, body width, body length, number of paper clips, folded wings, and
). The experiment employed a 1/2 fractional factorial designs with resolution of six. Thirty-two unique combinations of variables for the helicopters were created and only one data replication for each combination could be collected.
In order to remedy the lack of budget for replicated data and test for the linearity of the response variable and control variables, center points were used in the experiment. There are four combinations of center points and each center point had two replicates (8 paper helicopters used).
The helicopters were dropped from the bridge of Huck Life Science building and the time aloft was collected in the unit of seconds. The interest of the experiment was to find out the optimal combination of factors that may maximize the flying time.
## Research Questions
Our ultimate goal in this research is to optimize the helicopter design for the flight. We decided to study the effects of every factors to see how they affect the in-air duration of our paper helicopter. Additionaly, based on our result, we decided to figure out the top three important factors.
**Question 1: Among all the factors in our data set. what are the three most important factors that would make the flight time of paper helicopters longer?**
## Statistical Questions
In order to breakdown each research question, as well as clarify our research procedures, we expand the question into the following 2 questions:
**Question 1.1: What three main effects have the most significant impact on the response?**
**Question 1.2: Do any interactions have signficant impact on the response?**
## Variables
We analyzed the following variables: wing length, body length, body width, paper clips, folded wings, and taped wings as explanatory, and flight time as the response variables. It's worth to note that for each explanatory ordinal variable, there are three levels:-1, 0, 1 where 0 is the mid-point value of the higher and lower levels. However, categorical variables have two levels: yes and no, since no linear relation can be estabilshed between the categorical variables and the flight time.
| Variable | Description | Type | Levels |
|--------------|------------------------------------------------------------------------------------|-------------|-----------------------------|
| wing length | The length of the wing measured in millimeter. | Ordinal | -1: 120mm, 0: 95mm, 1: 70mm |
| Body Length | The length of the body measured in millimeter. | Ordinal | -1: 120mm, 0: 95mm, 1: 70mm |
| Body Width | The width of the body measured in millimeter. | Ordinal | -1: 50mm, 0: 43mm, 1: 35mm |
| Paper Clips | The number of paper clips that are attached to the bottom of the helicotpers' body | Ordinal | -1: Three, 0: Two, 1: One |
| Folded Wings | Whether or not the helicopter has folded wings. | Categorical | -1: Yes, 1: No |
| Taped Wings | Whether or not the helicopter's wings are taped. | Categorical | -1: Yes, 1: No |
| Flight Time | Response variables. The helicopter's aloft time. | Ordinal | Continuous in second. |
Table 1: The table provides information on each of the variables described including a description, typ, and levels. The last one is response variable while the rest are explanatory.
# Exploratory Data Analysis (EDA)
```{r desc, warning = FALSE,results='asis'}
# Basic stat description of the response data.
var_des = stat.desc(heli[,c("Y")])
kable(var_des[-c(2,3,7,10,11,12)],caption = "Variable Description",digits = 3)
```
```{r Outlier Detection, eval = FALSE}
# Outlier Detection
boxplot.stats(heli$Y)$OUT
```
Table 2: The table provides numerical description of the response variable.
The table above provides a summary description of the flight time. As we see, there’s no abnormal values or outliers in this dataset.
To answer the main research question regarding which factors have significant effect on the flight time, some preliminary boxplots comparing the response variable at each level of explanatory factors are created and visualized. Here are some interesting outcomes:
```{r EDA-boxplots-1, fig.cap="Figure 1: Flight time at each level of Wing Length"}
# Boxplots for wing length
boxplot(Y~wing_length,data=heli, main = "Wing Length")
```
Looking at the plot(Figure1), we can see that there is a potential linear relationship between wing length and the flight duration. At level -1(120mm), the durations are visually larger than the center points and the level 1(70mm). Therefore, wing length is likely to be a significant factor for the time aloft.
```{r EDA-boxplots-2, fig.cap="Figure 2: Flight time at each level of Paper Clips"}
# Boxplots for Paper Clips
boxplot(Y~paper_clips,data=heli,main = "Paper Clips")
```
From figure 2, it's worth to note that even though the median points of level 1(one paper clip) and level 2(three paper clip) are very close, the distribution of level -1 is more spread out than level 1 expecially in the lower percentile area. Overall, paper helicopters with one paper clip perform in a more stable fashion with slightly higher value of duration. Moreover, we might need to test for any interaction effect that could possibily cause the large votility occurred in the data.
```{r EDA-boxplots-3, fig.cap="Figure 3: Flight time at each level of Folded Wings"}
# Boxplots for Folded Wings
boxplot(Y~folded_wings,data=heli,main = "Folded Wings")
```
Figure 3 shows that there is a prominent difference between whether the helicopter’s wing is folded or not. At level1(without folded wings), the helicopters stayed longer in the air while with folded wings they performed worse. This plot meets our expectation since we guessed that the folded wings could reduce the air resistance and let air flow at a faster rate.
```{r EDA-boxplots-4, fig.cap="Figure 4: Flight time at each level of Taped Wings"}
# Boxplots for Taped Wing
boxplot(Y~folded_wings,data=heli,main = "Taped Wings")
```
Looking at figure 4, it’s clear that the helicopters without taped wings(level -1) can perform better in term of time than with taped wings(level 1).
Overall, these four factors seemed to be significant from the plots. The boxplots for body length and body width did not show any sign of significance(they are shown in appendix). In addition, looking at the normal probability plot and pareto chart(see in appendix) we can see there are some outliers and coefficients that exceed the critical value. This is also a signal that significant explanatory variables exist in the dataset.
# Statistical Analysis
In order to see which factors positively affect our paper helicopter's flight time, we firstly created a linear regression model with our 6 factors based on the factorial-designed data.
For the sake of finding the active efforts, we took a look of the statistical output from the full model. Below are the effects that have significant impact of our response.
| Factors | P-value |Factors | P-value |
|---------|----------|---------|----------|
| folded_wings | 0.000686 | taped_wings|0.000862|
|wing_length |5.52e-09 |wing_length:paper_clips|0.000256 |
|paper_clips |0.008614 |body_width:folded_wings|0.033660|
*table 3: Significant factors from full model*
The linear relation can be expressed in the following way:
Y = 4.12 - 0.5075* wing_length + 0.1755* taped_wings + 0.145* paper_clips - 0.0244* wing_length:taped_wings + 0.2231 * wing_length:paper_clips + 0.0669 * taped_wings:paper_clips
As we can see, there are x efforts whose p-values are less than 0.05, indicating a statistically significant. After looking at the normal qq-plot for the full model (see Appendix Figure 5), there are roughly 10 dots far away fron the line, which means there are 10 active efforts. Therefore, with the great match in the findings from both the model output and the qq-plot, we could make a conclusion that there are 10 active efforts in the full model of our data set.
Secondly, in order to figure out the three most important factors, we created a Pareto chart using Minitab for our designed factorial data set without the center point effect(see on Appendix Figure 6). The Pareto chart allows us to see which factors and variables had the biggest impact on our paper helicopter's flight time. Based on the result of the pareto plot, we used backwards elimination method to eliminate the factors with p-values greater than 0.05, which means not significant. After removing the insignificant factors, we were left four factors: wing length, paper clips, folded wings, and taped wings, which all show significant impacts on our model.
| Factors | P-value |Factors | P-value |
|---------|----------|---------|----------|
|wing_length|9.19e-11|paper_clips|0.008664|
|folded_wings|0.000514|taped_wings|0.000668|
|wing_length:paper_clip|0.000161|
*table 4: Significant factors from intermediate model*
According to the summary of the reduced model, all the selected variables are still significant. Also,we noticed that wing length and paper clips had a significant interaction with coefficient 0.22312. It can be intepret as following: while holding others constant, with shorter wing length(70mm) and less paper clips(one), the flight time would increase 0.22312 seconds on average. We suspect the effect of the interaction term is solely an random occurence caused by number of paper clips since we concluded that the helicopters with longer wings tend to perform better. We would need more data and further investigation on this matter.
Therefore, we decided to keep those two and either eliminate folded wings or taped wings. based on the summary of the models, folded_wings had very low p-value and it's a categorical variable, we were certain that paper helicopters with unfolded wing would have longer in-air time and no further investigation on this is required. In additional, after reviewing the boxplots and Pareto chart again, we decided on keeping taped wings in our final model for later sutdy. With this choice, we had a final model that looked like this:
| Factors | P-value | Factors |P-value|
| -------- | -------- | -------- | -------- |
| wing_length | 3.3e-09 | taped_wings|0.00409|
| paper_clips | 0.02915 | wing_length:paper_clips|0.00132|
*table 5: Significant factors from final model*
# Recommendations
### Statistical Questions
**Question 1.1 What three main effects have the most significant impact on the response?**
For our first statistical question, we found that wing length, paper clips, taped wings, and folded wings were all significant factors to our response, but we decided to eliminate folded wings and move forward to with the other three factors.
**Question 1.2 Do any interactions have signficant impact on the response?**
For our second research question, we found one significant interaction between wing length and paper clips.
### Research Question
**Question:Among all the factors in our data set. what are the three most important factors that would make the flight time of paper helicopters longer?**
For our research question, based on the p-values we found on our final model, we would make a conclusion that wing length, paper clips, and taped wings were our three biggest factors in making our paper helicopters fly longer and worth further study.
Based on our result, we would choose the paper helicopter with longer wing length (120mm), no taped wings, and fewer paper clips (1).
# Resources
## Tools and packages
R is used to analyze the data. The functions can be used to reproduce the result. The code used to transform the data and the outputs from analysis are provided in Appendix D. The open source R software can be downloaded via the link: https://www.r-project.org
Minitab is used to make our fractional factorial design. The functions can be used to determine which level of resolution are possible with our study. The open source Minitab software can de download via the link: https://www.minitab.com/en-us/
To fit our factorial design into the linear model, lm() was used in R. This function is part of the stats package that can be downloaded using the install.packages(“stats”) command in R.More information on this function can be found via the link: https://www.rdocumentation.org/packages/stats/versions/3.6.1
To check normality for residuals in the model, qqnorm() was used in R. This function is a generic function the default method of which produces a normal QQ plot of the values in y. More information on this function can be found via the link: https://www.rdocumentation.org/packages/stats/versions/3.6.1/topics/qqnorm
## References and Resources
Assumptions of Multiple Linear Regression - Statistics Solutions. (2016). Retrieved October, 2018, from http://www.statisticssolutions.com/assumptions-of-multiple-linear-regression/
Experiment planning: Factorial design, factor analysis.(2008). Rose, Oliver, Averill Law, and David Kelton from
https://www.net.in.tum.de/pub/simulationstechnik/ws20102011/ST_WS2010_Ch5_ExperimentPlanning.pdf
RStudio Team (2015). RStudio: Integrated Development for R. RStudio, Inc., Boston, MA URL http://www.rstudio.com/.
# Additional Considerations
There are several additional considerations regarding this study, its results, and its scope of inference. First, it is important to note that the paper helicopter data used to complete this analysis was not collected with the specific goal of answering our research questions. Though the statistical analyis was completed based on a subset of the data and the conclusions were appropriately drawn based on those outputs, it is important to note that the data used in the analysis was not initially meant to answer the two research questions of interest.
In terms of limitations to the recommendations, first of all, the sample size was small and the data was collected only from our own experiment in one location. This may make our sample not be representative of the entire paper helicopter population. Also, since we conllected the data from each types of paper helicopters only once, the errors within each types of paper heliconpters may exist.
Further, the effects of all six factors in our data set was studied, but there could be other factors that may also affect the flight time of paper helicopters that we did not study, like the paper materials, taped body, the condition of dropping locations and etc. Thus, the study can be improved if we could account for more information.
In the future, there is certainly more opportunity for study to continue to understand what would cause the flight time of paper helicopters longer. First of all, if the prerequisites allowed, more types of paper helicopters could be used in the experiments instead of just 40 types. With data gathered from more types of paper helicopters, there would be the opportunity to have a much wider and clearer view of the dataset and conclusions that can be drawn from it. Additionally, there is an opportunity to make our analysis more precise to do the dropping for each types of paper helicopters more than once. Including replicates in the future studies could help reducing errors that gives a more tenable support of the analysis outcome.
# Technical Appendix
```{r box_notS}
par(mfrow=c(1,2))
boxplot(Y~body_length,data=heli,main = "Body Length")
boxplot(Y~body_width,data=heli,main = "Body width")
```
*Figure 4: boxplot for Body Length and Body Width*
We can see by looking at the boxplots for body length and body width, there are no prominent differences between each categories. Therefore, we would make a pre-assumption that body length and body width may not be the factors that impact the response.
```{r nomal_full}
full_model = lm(Y~.^6, data = heli)
qqnorm(full_model$coefficients[-1],pch=19);qqline(full_model$coefficients[-1])
```
*Figure 5: Normal qq-plot for the full model*
We can see from thr normal qq-plot that there are roughly 12 active effects on the full model.

*Figure 6: Pareto Plot for the full model without center point from Minitab*
What provided below is the raw statistical output from the full models:
```{r model_full}
model1 = lm(Y~.^4, data = heli)
summary(model1)
```
*Figure 7: the Full model result*
```{r model_inter}
model2 = lm(Y~(wing_length+paper_clips+folded_wings+taped_wings)^2, data = heli)
summary(model2)
```
*Figure 8: the intermediate model result*
Combining our explorations from the normal qq-plot and our full model result, we can see that the active efforts appeared on each figures are great matched. Based on the pareto plot without the effect of center points and the output from the intermediate model, we finally decided to choose wing length, taped wings, and paper clips as our three main factors.
```{r residual_final}
model3 = lm(Y~(wing_length+taped_wings+paper_clips)^2, data = heli)
par(mfrow=c(2,2))
plot(model3)
```
*Figure 9: the Final model assumption check*
For our final linear regression model, we check the LINE assumptions: linearity, independence, normality, and equal variance. We see no significant pattern in residuals, no autocorrelation, roughly standard qq-plots, so we can conclude that our assumptions are met.
```{r normal_final}
model3 = lm(Y~(wing_length+taped_wings+paper_clips)^2, data = heli)
qqnorm(model3$coefficients[-1],pch=19);qqline(model3$coefficients[-1])
```
*Figure 10: Normal qq-plot for the final model*
We can see from thr normal qq-plot that there are roughly 4 active effects on the final model.
Also provided is the statistical output from the final model:
```{r model_final}
model3 = lm(Y~(wing_length+taped_wings+paper_clips)^2, data = heli)
summary(model3)
```
*Figure 11: the Final model result*
### R Script
```{r ref.label=c('Front Matter', 'desc', 'Outlier Detection', 'EDA-boxplots-1','EDA-boxplots-2','EDA-boxplots-3','EDA-boxplots-4','box_notS','nomal_full','model_full','model_inter','residual_final','normal_final','model_final'), echo=TRUE, eval=FALSE}
# Reprinted code chunks used previously for analysis
```