---
title: "Paper Helicopter - 2"
author: 'Zihan Zhang, Yang Gao (yug61), Brain Dailey'
date: 'December 13, 2019'
output:
pdf_document:
toc: yes
toc_depth: '2'
---
```{r Front Matter, include=FALSE}
# clean up & set default chunk options
rm(list = ls())
knitr::opts_chunk$set(echo = FALSE)
# packages
library(tidyverse) # for example
library(mosaic) # for example
# user-defined functions
# inputs
heli_ccd = read.delim("heliccd.txt",header = TRUE)
```
# Project Description
A designed experimental study was conducted to find the optimal value for all the selected factors(wing length, body length) by the previous research analysis that significantly affect the in-air duration (response variable: time) of the paper helicopters that are designed and made by study groups in STAT470W.
In order to find the optimal helicopers with the highest flight time, the experiment employed 2 factors central composite design with 13 samples to assess the response surface. The helicopters were dropped from the bridge of Huck Life Science building and the time aloft was collected in the unit of seconds. Each helicopter was dropped 3 times and the average among these 3 droppings was used in the study. The interest of the experiment was to find out the best value for each selected factor to maximize the flying time.
## Research Questions
Our ultimate goal in this research is to optimize the helicopter design for the flight. We decided to figure out the exact numerical values of all the factors in this study, and find the best combination value for the factors selected by the previous analysis to maximize the in-air duration of our paper helicopter.
**Question 1:**
What wing length and body length combination will produce the best paper helicopter with the smallest variance between repeated measures?
## Statistical Questions *(optional)*
**Question 1.1:**
Which terms in the following equation would be the most crucial factors for the optimal helicopter?
**Question 1.2:**
$time = \alpha+\beta_1*WingLength+\beta_2*WingLength^2+\beta_3*BodyLength+\\\beta_4*BodyLength^2+\beta_5*BodyLength*WingLength$
## Variables
We have three variables of interest in this experiment: wing length, body length, and flight time(response) while holding other variables constant. There is one paper clip, 42mm body width, unfolded wings, and untapped body and wings for every helicopter in this study.
All levels of the wing length and body length were generated by Minitab.
| Variable | Description | Type | Levels |
|--------------|------------------------------------------------------------------------------------|-------------|-----------------------------|
| wing length | The length of the wing measured in millimeter. | Ordinal | 60mm;70mm;95mm;120mm;130mm |
| Body Length | The length of the body measured in millimeter. | Ordinal | 60mm;70mm;95mm;120mm;130mm |
| Flight Time | Response variables. The helicopter's aloft time. | Ordinal | Continuous in second. |
*Table 1: The table provides information on each of the variables described including a description, type, and levels. The flight time is our response variable while wing length and body length are our explanatory variables.*
# Exploratory Data Analysis (EDA)
First, we checked the data: there is no missing value and the data is normally distributed(plots are in Appendix).
As the key question for this analysis is about how wing and body length would affect the flight time of the paper helicopters, we first want to see the boxplots of time vs wing length&body length:
```{r boxplots,echo=FALSE}
par(mfrow=c(1,2))
boxplot(time~winglength, data = heli_ccd, main = " Flight Times for Different Level of Wing Lengths")
boxplot(time~bodylength, data = heli_ccd, main = " Flight Times for Different Level of Body Lengths")
```
*Figure 1: Boxplots shows the time aloaft at each level of wing&body lengths*
For wing length, there is a strong positive linear relation and no outlier has been detected. For body length, we do not see a clear linear relation from the plot where the time did not change significantly with body length. However, there are some outliers.
Next, we wanted to check the pareto chart to get a preliminary look of the main effects and interaction effects from the data.

*Figure 2: Pareto Chart of the standardized effects($\alpha = 0.05$)*
As we assumed, the pareto chart shows the main effect of wing length dominated the flight time. Body length and other interaction were not significant.
Overall, we could say that wing length is positvely correlated with the flight time from EDA.
# Statistical Analysis
To answer the research question on how to improve the paper helicopter's performance, we first needed to fit a model for our variables and here is the table for the coded coefficient and p-values:
|Term |Coef |P-Value|
| ------- | -------| ------|
|Constant |4.792 |0.000|
|wing length| 0.904| 0.000|
|body length| -0.051 |0.687|
|wing length*wing length | 0.084 |0.663|
|body length*body length| -0.196 |0.326|
|wing length*body length| -0.405| 0.141|
*Table 2: Coded coefficient for the Fitted model*
Regression equation in uncoded units:
$time = -1.23 + 0.0435 wing length + 0.0591 body length + 0.000067 wing length*wing length - \\0.000157 body length*body length - 0.000324 wing length*body length$
Wing length is the statistically significant variable in this model with a p-value closes to zero. From the uncoded equation, we noticed that the coefficients for wing length and wing length square are positive which means while holding others constant, longer wing length would generate higher afloat time.
Even though the p-value for body length is not significant, we still need to consider the effect of body length: the coefficient for body length is 0.0591 and body length square is -0.000157. The later term would dominate the effect of body length since as body length gets larger, the square term would get very large. Therefore we can conclude that the body length is negatively correlated with the flight time.

*Figure 3: Contour plot for time*
As shown in the plot, the longest flight time appears on the top left corner where the wing length is at the maximum value(130mm) of this experiment and body length is the shortest around 60mm. However, we noticed that the flight time did not increase much with a decreased body length especially when the wing length was short (smaller than 100mm) according to the contour plot. We suspected whether the body length was significant or not for the flight time.

*Figure 4: Contour plot for standard deviation*
We can see here that the combinations with the lower sd have low values on both wing and body lengths. The bottom right and top left corner have the highest sd which means when one of the variable is high and the other one is low, it would variate the most.

*Figure 5: Interaction plot*
The interaction plot shows that when the wing length is short, higher body length would increase the helicopter’s in-air duration whereas with a longer wing length. However, the best performance helicopters always have the longest wing length and it’s appropriate to choose a longer wing length for the optimal helicopters.

*Figure 6: optimization plot based on the ccd design analysis*
According to the optimization plot, our maximized flight time would occur when the wing length equals to 130 mm and body length equals to 60mm. However, as we discussed previously: a high wing length with low body length would generate relatively high variance. We decided to focus for the highest mean flight time rather than lower the variance since we could not achieve both goals with the data we had.
It's worth to note here that the given values in the plot may not generate the highest flight time among all of the possible combinations since the data did not capture the vertex of the response surface. Therefore we needed to be prudent because it's unlikely that the flight time will increase steadily as we increase the wing length. Our final helicopter was designed based on the 13 samples.
# Recommendations
### Research Question
**Question: What wing length and body length combination will produce the best paper helicopter with the smallest variance between repeated measures?**
In summary, based on the analysis, the final optimal helicopter should be designed in the following way:
|Term |Coef |
| ------- | -------|
|wing length| 130mm|
|body length| 60mm|
|paper clip| one|
|taped wings| no|
|taped body| no|
*Table 3:the final values for the optimal designed paper helicopter.*
# Resources
## Tools and packages
R is used to clean the data and do basic mathmatical calculation. The functions can be used to reproduce the result. The code used to transform the data and the outputs from analysis are provided in Appendix D. The open source R software can be downloaded via the link: https://www.r-project.org
Minitab is used to generate plots on the EDA part and analyze the data. The functions can be used to determine which level of resolution are possible with our study. The open source Minitab software can de download via the link: https://www.minitab.com/en-us/
## References and Resources
RStudio Team (2015). RStudio: Integrated Development for R. RStudio, Inc., Boston, MA URL http://www.rstudio.com/.
Olawoye, Babatunde. (2016). A COMPREHENSIVE HANDOUT ON CENTRAL COMPOSITE DESIGN (CCD).
https://www.researchgate.net/publication/308608329_A_COMPREHENSIVE_HANDOUT_ON_CENTRAL_COMPOSITE_DESIGN_CCD
Contour Plot, NCSS Statistical Software, Chapter 127
URL: https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Contour_Plots.pdf
Pareto Chart Basics, Minitab 18
URL: https://support.minitab.com/en-us/minitab/18/help-and-how-to/quality-and-process-improvement/quality-tools/supporting-topics/pareto-chart-basics/
# Additional Considerations
There are several additional considerations regarding this study, its results, and its scope of inference. First, it is important to note that the paper helicopter data used to complete this analysis was not collected with the specific goal of answering our research questions. Though the statistical analyis was completed based on a subset of the data and the conclusions were appropriately drawn based on those outputs, it is important to note that the data used in the analysis was not initially meant to answer the two research questions of interest.
In terms of limitations to the recommendations, first of all, the sample size was small and the data was collected only from our own experiment in one location. In this time, our sample size was even smaller that there were only 13 samples in our sample set. This may make our sample not be representative of the entire paper helicopter population.
Second, when designing the experiment, we only concerned about how to change the selected factors' values to increase the in-air duration but unable to reduce the variance at the same time. Therefore, even though the longest in-air duration for our optimal designed paper helicopter is good enough(2nd place among the 8 groups), the variance (0.612) of 4 repeated droppings is kind of high when comparing it with other groups. This may make the average value vary too much.
In the future, there is certainly more opportunity to find the optimal values for selected factors to cause the flight time of paper helicopters longer. If the prerequisites allowed, paper helicopters with longer wing length could be used in the experiment. According to our optimization plot (Figure 4), we did not find the vertex for the wing length curve. With the data gathered from the paper helicopters with longer wing length and coresponding body length in the combination, there would be the opportunity to have a much wider and clearer view of the dataset and conclusions that can be drawn from it.
# Technical Appendix
**Assumptions check:**
```{r datadist,echo=FALSE}
hist(heli_ccd$time)
```
*Figure 7: the histogram to check the assumption*
According to the histogram, we can claim that our data is normally distributed.
```{r normality,echo=FALSE}
mod = lm(time ~ .^2+I(winglength^2)+I(bodylength^2), data = heli_ccd)
par(mfrow=c(1,2))
qqnorm(mod$residuals,main = 'Residual Normality Plot')
plot(mod$residuals,main = "Residual Plot")
```
*Figure 8: the residual plot to check the assumption*
As the residual plots shown, the normality assumptions are met and there is no trend. The variance seems constant
**Raw Data**

*This is the raw data with 13 samples we used in this study.*
### R Script
#### Most of the analysis used Minitab built functions. We only used R for boxplots and assumption checking.
```{r ref.label=c( 'Frong Matter','boxplots', 'datadist', 'normality'), echo=TRUE, eval=FALSE}
# Reprinted code chunks used previously for analysis
```