Workshop - HackMD

# Workshop ###### tags: `R` `Statistics` `workshop` ## Dataset **Data Set Information:** ** This dataset contains the medical records of **66 patients who had Hepatic portal venousgas(HPVG)**. Each patient profile has **13 clinical variables**. **Attribute Information:** * Patient No: id * Sex (0:Female; 1:Male): sex * Age (years): age * Symptom onset to ED presentation (hours): p_time * Body temperature (℃): bt * Pulse rate (bpm): pr * Respiratory rate (breaths/min):rr * Mean arterial pressure (mmHg): bp * Rapid Acute Physiology Score: RAPS * Rapid Emergency Medicine Score: REMS * Modified Early Warning Score: MEWS * Management (0:Conservative; 1:Surgery): management * ED presentation to operation (hours): op_time * End outcome (0:Survival; 1:Death): outcome [PLoS ONE12(9):e0184813.](https://doi.org/10.1371/journal.pone.0184813) ## Some Useful R Code and Package ```r= # loading library library(pwr) # for power analysis library(openxlsx) # for open and save excel file library(dplyr) # for data transformation library(ggplot2) # for graph # import xxxx.txt file and name it data data <- read.table("data/xxxx.txt", header = T, sep = "\t") # import xxxx.xlsx file and name it data1 data1 <- readWorkbook("data/xxxx.xlsx") # check the structures of datasets str(data) # write to xlsx file write.xlsx(my.table, file = "table.xlsx", colNames = T, rowNames = T) ``` ### Basic R function ```r= # calculate mean, SD, quantile mean(data$x) sd(data$x) quantile(data$x) summary(data$x) # The Shapiro-Wilk Test For Normality shapiro.test(data$x) # perform t-test t.test(data$a, data$b) # perform Mann Whitney U Test wilcox.test(data$a, data$b) # make a table my_table <- table(data$a, data$b) # make a table with define lable my_table2 <- table(Sex = data$a, Angina_type = data$b) # change row name rownames(my_table) <- c("female","male") # change column name colnames(my_table) <- c("Typical","Atypical","Non-anginal pain", "Asymptomatic") # perform Chi-square test of independence chisq.test(my_table) ``` ### dplyr * Use a data frame and create a data frame * Comparisons: >, >=, <, <=, !=, and == * Logical operator: & (and), | (or), and ! (not) **filter():** Pick observations by their values ![](https://i.imgur.com/DQQQ11i.png) ```r= # find male with heart failure m_hf <- filter(data, sex == 1, target == 1) str(m_hf) # find patient with thalassemia thal_p1 <- filter(data, thal == 2 | thal == 3) str(thal_p1) ``` **arrange():** Reorder the rows ![](https://i.imgur.com/P9FLbyg.png) ```r= # arrange in ascending order data_arr <- arrange(data, thal) # in descending order data_arr <- arrange(data1, desc(thal)) ``` **select():** Pick variables by their names ![](https://i.imgur.com/0QVUQxX.png) ```r= # pick age, sex and ca columns age_sex_ca <- select(data, age, sex, ca) # pick the columns from cp to fbs cp_to_fbs <- select(data, cp:fbs) # remove the columns from cp to fbs no_cp_to_fbs <- select(data, -(cp:fbs)) # remane the restecg column new_data <- rename(data1, ekg = restecg) ``` **mutate():** Create new variable **transmute():** keep the new variables only ![](https://i.imgur.com/XR8MpYf.png) ```r= # add new columns age_sex and cp_fbs new_columns <- mutate(data, age_sex = age - 10 * sex, cp_fbs = cp + fbs) # save only the new columns new_data <- transmute(data, age_sex = age - 10 * sex, cp_fbs = cp + fbs) ``` **summarize():** summary **group_by():** operate group by group ![](https://i.imgur.com/QZkJ5us.png) ```r= # summarize the mean of age, SD and total pt number summarize(data, age_mean = mean(age), sd = sd(age), n= n()) # group by sex and cp group <- group_by(data, sex, cp) summarize(group1, age_mean = mean(age), sd = sd(age), n= n()) # use count data %>% count(sex, cp) data %>% count(sex, target) # seperate the data by sex # %>% is pipe female <- heights %>% filter(sex =="Female" ) male <- heights %>% filter(sex =="Male" ) ``` ### ggplot2 ```r= # Histogram ggplot(data=df, aes(x= x)) + geom_histogram(binwidth= 1) # Dotplot ggplot(data = df, aes(x = x, y = y)) + geom_dotplot(binaxis ='y', stackdir = 'center', stackratio = 0.5, dotsize = 0.3) # Box Plot ggplot(data = df, aes(x = x, y = y)) + geom_boxplot() + scale_x_discrete(labels=c("0" = "Female", "1" = "male")) # Bar Plot ggplot(data = data, aes(x = x, fill = a )) + geom_bar() ggplot(data = data, aes(x = x, fill = a )) + geom_bar(position = "dodge") + scale_x_discrete( labels = c("a1", "a2", "a3", "a4") ) ``` ## Today's Workshop :::info PLOS journals require authors to make all data necessary to replicate their study’s findings publicly available without restriction at the time of publication. As a result, we can easily download the raw data of the published article. Today, we will use the dataset from plos one website to perform the analysis. Our goal is to **re-create the table 1 from the dataset** and to **identify any statistical misusage**. ::: --- --- ### Rapid Emergency Medicine Score: A novel prognostic tool for predicting the outcomes of adult patients with hepatic portal venous gas in the emergency department ### Table 1 ![](https://i.imgur.com/AAUXdMG.png) ### Methods ::: success **Statistical Analysis:** Numerical and categorical variables are shown as mean ± SD, and frequencies are displayed as percentages (%). Univariate analyses were applied to study the association between predictors and mortality, while categorical and numerical variables were analyzed with a chi-square test and two-sample t-test respectively. ::: --- ---