# Research Group Discussion NO. 10
###### tags: `R` `Statistics` `workshop`
## Dataset
**Data Set Information:**
** This dataset contains the medical records of **229 geriatric patients who visited WF ER**. Each patient profile has **22 clinical variables**.
<br/>
**Attribute Information:**
* Age: year
* Sex: male = 1
* Admission: admitted = 1
* ICU: ICU admission; admitted = 1
* LOS: length of stay; day
* DEMENTIA: dementia (+) = 1
* CVA: CVA (+) =1
* Liver_d: liver disease (+) = 1
* DM: DM (+) =1
* CKD: CKD (+) = 1
* tumor: any cancer (+) = 1
* ISAR: Identification of Seniors At Risk (ISAR) * score
* CCI: The Charlson Comorbidity Index
* Katz: Katz Index of Independence
* AD8: Dementia Screening Interview
* SOF: The study of osteoporotic fractures (SOF) index
* MNA: The Mini Nutritional Assessment (MNA)
* BSRS-5: The 5-item Brief Symptom Rating Scale
* x72hrs_return: 72 hours ER return
* x30D: 30 days mortality
* x30R: 30 days ER return
* x30A: 30 days hospital admission
## Some Useful R Code and Package
```r=
# loading library
library(pwr) # for power analysis
library(openxlsx) # for open and save excel file
library(dplyr) # for data transformation
library(ggplot2) # for graph
# import xxxx.txt file and name it data
data <- read.table("data/xxxx.txt", header = T, sep = "\t")
# import xxxx.xlsx file and name it data1
data1 <- readWorkbook("data/xxxx.xlsx")
# check the structures of datasets
str(data)
# write to xlsx file
write.xlsx(my.table, file = "table.xlsx", colNames = T, rowNames = T)
```
<br/>
### Basic R function
```r=
# calculate mean, SD, quantile
mean(data$x)
sd(data$x)
quantile(data$x)
summary(data$x)
# The Shapiro-Wilk Test For Normality
shapiro.test(data$x)
# perform t-test
t.test(data$a, data$b)
# perform Mann Whitney U Test
wilcox.test(data$a, data$b)
# make a table
my_table <- table(data$a, data$b)
# make a table with define lable
my_table2 <- table(Sex = data$a, Angina_type = data$b)
# change row name
rownames(my_table) <- c("female","male")
# change column name
colnames(my_table) <- c("Typical","Atypical","Non-anginal pain", "Asymptomatic")
# perform Chi-square test of independence
chisq.test(my_table)
# Regression
# build a model (univariate linear)
model <- glm(y ~ x, data = data, family = gaussian)
summary(model1)
# build models (univariate logistic)
model <- glm(y ~ x, data = data, family = binomial)
summary(model)
# make a function for getting the results from regression model
my_glm <- function(x, y){
model <- glm(y ~ x, family='binomial')
p <- summary(model)$coefficients[2,4]
or <- exp(coefficients(model))[[2]]
CI <-c(exp(confint(model))[2,1], exp(confint(model))[2,2])
final <- c(or, CI, p)
return(final)
}
```
<br/>
----------------------------------------------
[Code for Table 1 (class 7)](https://hackmd.io/J2Iin4hwQeKLbQ5F1pZZ9A)
----------------------------------------------
### dplyr
* Use a data frame and create a data frame
* Comparisons: >, >=, <, <=, !=, and ==
* Logical operator: & (and), | (or), and ! (not)
<br/>
**filter():** Pick observations by their values

```r=
# find male with heart failure
m_hf <- filter(data, sex == 1, target == 1)
str(m_hf)
# find patient with thalassemia
thal_p1 <- filter(data, thal == 2 | thal == 3)
str(thal_p1)
```
<br/>
**arrange():** Reorder the rows

```r=
# arrange in ascending order
data_arr <- arrange(data, thal)
# in descending order
data_arr <- arrange(data1, desc(thal))
```
<br/>
**select():** Pick variables by their names

```r=
# pick age, sex and ca columns
age_sex_ca <- select(data, age, sex, ca)
# pick the columns from cp to fbs
cp_to_fbs <- select(data, cp:fbs)
# remove the columns from cp to fbs
no_cp_to_fbs <- select(data, -(cp:fbs))
# remane the restecg column
new_data <- rename(data1, ekg = restecg)
```
<br/>
**mutate():** Create new variable
**transmute():** keep the new variables only

```r=
# add new columns age_sex and cp_fbs
new_columns <- mutate(data, age_sex = age - 10 * sex, cp_fbs = cp + fbs)
# save only the new columns
new_data <- transmute(data, age_sex = age - 10 * sex, cp_fbs = cp + fbs)
```
<br/>
**summarize():** summary
**group_by():** operate group by group

```r=
# summarize the mean of age, SD and total pt number
summarize(data, age_mean = mean(age), sd = sd(age), n= n())
# group by sex and cp
group <- group_by(data, sex, cp)
summarize(group1, age_mean = mean(age), sd = sd(age), n= n())
# use count
data %>% count(sex, cp)
data %>% count(sex, target)
# seperate the data by sex
# %>% is pipe
female <- heights %>% filter(sex =="Female" )
male <- heights %>% filter(sex =="Male" )
```
<br/>
## Today's Workshop
:::info
Today, we will use the geriatric dataset from our ER to perform data analysis. Our goal is to **create a table 1 from the dataset** and to **perform some statistical analysis (correlation and regression)**.
:::
</br>