# Chi Square and Fisher’s Exact Test
###### tags: `R` `Statistics` `Chi-Square` `Fisher's Exact Test` `p-value` `Alpha Inflation`
## Heart Dataset
**Data Set Information:** This dataset contains the medical records of **303 patients who had heart failure**, collected during their follow-up period, where each patient profile has **13 clinical features**.
* age: age in years
* sex: sex
* Value 0 = female
* Value 1 = male
* cp: chest pain type
* Value 0: typical angina
* Value 1: atypical angina
* Value 2: non-anginal pain
* Value 3: asymptomatic
* trestbps: resting blood pressure (in mm Hg on admission to the hospital)
* chol: serum cholesterol in mg/dl
* fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
* restecg: resting electrocardiographic results
* Value 0: normal
* Value 1: having ST-T wave abnormality
* Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
* thalach: maximum heart rate achieved
* exang: exercise induced angina
* Value 0 = no
* Value 1 = yes
* oldpeak = Continuous value for ST depression induced by exercise relative to rest
* slope: the slope of the peak exercise ST segment
* Value 0: upsloping
* Value 1: flat
* Value 2: downsloping
* ca: number of major vessels (0-3) colored by flourosopy
* thal: thalassemia
* Value 1 = normal
* Value 2 = fixed defect
* Value 3 = reversible defect
* target: heart disease
* Value 0 = no
* Value 1 = yes
**Original dataset version:** Tanvir Ahmad, Assia Munir, Sajjad Haider Bhatti, Muhammad Aftab, and Muhammad Ali Raza: "Survival analysis of heart failure patients: a case study". PLoS ONE 12(7), 0181001 (2017).
<br/>
## Chi Square and Fisher's Exact Test
* We will first import the heart files (heart.txt).
* Frist put your heart file into the **"data folder"**
``` r=
# load library
library(ggplot2) # for graph
# import heart.txt file and name it data
data <- read.table("data/heart.txt", header = T, sep = "\t")
# check the structures of dataset
str(data)
```
The sex ratio in humans is about 1:1. In humans, the natural ratio between males and females at birth is slightly biased towards the male sex. The sex ratio for the entire world population is **101 males to 100 females (2018 est.)**. Let us use the chi-squared goodness of fit test to check if our data fit the world population
<br/>
```r=
# for Goodness of fit test
#chisq.test( x = observation #, p = expected probability)
# count male #
male <- sum(data$sex)
# count female #
female <- (length(data$sex) - male)
#calculate the expected probability
male_p <- 101/(101+100)
female_p <- 100/(101+100)
# perform Chi-square for Goodness of fit test
chisq.test(x = c(male, female), p = c(male_p, female_p ))
```
<br/>
* If we want to use chisq.test() to perform **chi-square(Test of Independence)**, we need to make a contingency table.
```r=
# make a table---
Sex <- matrix(c(1,2,3,4),ncol=2,byrow=TRUE)
colnames(Sex) <- c("E","L")
rownames(Sex) <- c("M","F")
Sex <- as.table(Sex )
Sex
# make a table
my_table1 <- table(data$sex, data$cp)
my_table1
# make a table with define lable
my_table2 <- table(Sex = data$sex, Angina_type = data$cp)
my_table2
# change row name
rownames(my_table2) <- c("female","male")
my_table2
# change column name
colnames(my_table2) <- c("Typical","Atypical","Non-anginal pain", "Asymptomatic")
my_table2
# perform Chi-square test of independence
chisq.test(my_table1)
chisq.test(my_table2)
# the shortcut
chisq.test(table(data$sex, data$cp))
# perform Fisher's Exact Test
fisher.test(my_table2)
# let us make a graph
data$sex <- as.factor(data$sex)
ggplot(data = data, aes(x = cp, fill = sex )) + geom_bar()
# mild adjustment
ggplot(data = data, aes(x = cp, fill = sex )) +
geom_bar(position = "dodge") +
scale_x_discrete(
labels = c("Typical", "Atypical", "Non-anginal", "Asymptomatic")
)
```
<br/>
## Alpha Inflation
* Multiple comparison tests are performed several times on the mean of experimental conditions.
* In the situation of comparing the three groups: group A versus group B, group B versus group C, and group A versus group C.
* A pair for this comparison is called **family**.
* The type I error that occurs when each family is compared is called the **family-wise error’ (FWE)**.
* Inflated α = 1 − (1 − α)<sup>N</sup> , N = number of hypotheses tested
* If we performed 20 hypotheses tests, is p-value > 0.05 acceptable?
```r=
# Let us use p.adjust() to adjust p-value
# p.adjust(p, method = p.adjust.methods, n = # of comparison)
# p.adjust.methods: "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr")
p.adjust(0.05, method = "holm", n = 20)
p.adjust(0.05, method = "hochberg", n = 20)
p.adjust(0.05, method = "hommel", n = 20)
p.adjust(0.05, method = "bonferroni", n = 20)
p.adjust(0.01, method = "bonferroni", n = 20)
p.adjust(0.005, method = "bonferroni", n = 20)
p.adjust(0.0025, method = "bonferroni", n = 20)
```