# Research Group Discussion NO. 10 ###### tags: `R` `Statistics` `workshop` ## Dataset **Data Set Information:** ** This dataset contains the medical records of **229 geriatric patients who visited WF ER**. Each patient profile has **22 clinical variables**. <br/> **Attribute Information:** * Age: year * Sex: male = 1 * Admission: admitted = 1 * ICU: ICU admission; admitted = 1 * LOS: length of stay; day * DEMENTIA: dementia (+) = 1 * CVA: CVA (+) =1 * Liver_d: liver disease (+) = 1 * DM: DM (+) =1 * CKD: CKD (+) = 1 * tumor: any cancer (+) = 1 * ISAR: Identification of Seniors At Risk (ISAR) * score * CCI: The Charlson Comorbidity Index * Katz: Katz Index of Independence * AD8: Dementia Screening Interview * SOF: The study of osteoporotic fractures (SOF) index * MNA: The Mini Nutritional Assessment (MNA) * BSRS-5: The 5-item Brief Symptom Rating Scale * x72hrs_return: 72 hours ER return * x30D: 30 days mortality * x30R: 30 days ER return * x30A: 30 days hospital admission ## Some Useful R Code and Package ```r= # loading library library(pwr) # for power analysis library(openxlsx) # for open and save excel file library(dplyr) # for data transformation library(ggplot2) # for graph # import xxxx.txt file and name it data data <- read.table("data/xxxx.txt", header = T, sep = "\t") # import xxxx.xlsx file and name it data1 data1 <- readWorkbook("data/xxxx.xlsx") # check the structures of datasets str(data) # write to xlsx file write.xlsx(my.table, file = "table.xlsx", colNames = T, rowNames = T) ``` <br/> ### Basic R function ```r= # calculate mean, SD, quantile mean(data$x) sd(data$x) quantile(data$x) summary(data$x) # The Shapiro-Wilk Test For Normality shapiro.test(data$x) # perform t-test t.test(data$a, data$b) # perform Mann Whitney U Test wilcox.test(data$a, data$b) # make a table my_table <- table(data$a, data$b) # make a table with define lable my_table2 <- table(Sex = data$a, Angina_type = data$b) # change row name rownames(my_table) <- c("female","male") # change column name colnames(my_table) <- c("Typical","Atypical","Non-anginal pain", "Asymptomatic") # perform Chi-square test of independence chisq.test(my_table) # Regression # build a model (univariate linear) model <- glm(y ~ x, data = data, family = gaussian) summary(model1) # build models (univariate logistic) model <- glm(y ~ x, data = data, family = binomial) summary(model) # make a function for getting the results from regression model my_glm <- function(x, y){ model <- glm(y ~ x, family='binomial') p <- summary(model)$coefficients[2,4] or <- exp(coefficients(model))[[2]] CI <-c(exp(confint(model))[2,1], exp(confint(model))[2,2]) final <- c(or, CI, p) return(final) } ``` <br/> ---------------------------------------------- [Code for Table 1 (class 7)](https://hackmd.io/J2Iin4hwQeKLbQ5F1pZZ9A) ---------------------------------------------- ### dplyr * Use a data frame and create a data frame * Comparisons: >, >=, <, <=, !=, and == * Logical operator: & (and), | (or), and ! (not) <br/> **filter():** Pick observations by their values ![](https://i.imgur.com/DQQQ11i.png) ```r= # find male with heart failure m_hf <- filter(data, sex == 1, target == 1) str(m_hf) # find patient with thalassemia thal_p1 <- filter(data, thal == 2 | thal == 3) str(thal_p1) ``` <br/> **arrange():** Reorder the rows ![](https://i.imgur.com/P9FLbyg.png) ```r= # arrange in ascending order data_arr <- arrange(data, thal) # in descending order data_arr <- arrange(data1, desc(thal)) ``` <br/> **select():** Pick variables by their names ![](https://i.imgur.com/0QVUQxX.png) ```r= # pick age, sex and ca columns age_sex_ca <- select(data, age, sex, ca) # pick the columns from cp to fbs cp_to_fbs <- select(data, cp:fbs) # remove the columns from cp to fbs no_cp_to_fbs <- select(data, -(cp:fbs)) # remane the restecg column new_data <- rename(data1, ekg = restecg) ``` <br/> **mutate():** Create new variable **transmute():** keep the new variables only ![](https://i.imgur.com/XR8MpYf.png) ```r= # add new columns age_sex and cp_fbs new_columns <- mutate(data, age_sex = age - 10 * sex, cp_fbs = cp + fbs) # save only the new columns new_data <- transmute(data, age_sex = age - 10 * sex, cp_fbs = cp + fbs) ``` <br/> **summarize():** summary **group_by():** operate group by group ![](https://i.imgur.com/QZkJ5us.png) ```r= # summarize the mean of age, SD and total pt number summarize(data, age_mean = mean(age), sd = sd(age), n= n()) # group by sex and cp group <- group_by(data, sex, cp) summarize(group1, age_mean = mean(age), sd = sd(age), n= n()) # use count data %>% count(sex, cp) data %>% count(sex, target) # seperate the data by sex # %>% is pipe female <- heights %>% filter(sex =="Female" ) male <- heights %>% filter(sex =="Male" ) ``` <br/> ## Today's Workshop :::info Today, we will use the geriatric dataset from our ER to perform data analysis. Our goal is to **create a table 1 from the dataset** and to **perform some statistical analysis (correlation and regression)**. ::: </br>