# Regression ###### tags: `R` `statistics` `regression` ## Dataset **Data Set Information:** ** This dataset contains the medical records of **66 patients who had Hepatic portal venousgas(HPVG)**. Each patient profile has **13 clinical variables**. <br/> **Attribute Information:** * Patient No: id * Sex (0:Female; 1:Male): sex * Age (years): age * Symptom onset to ED presentation (hours): p_time * Body temperature (℃): bt * Pulse rate (bpm): pr * Respiratory rate (breaths/min):rr * Mean arterial pressure (mmHg): bp * Rapid Acute Physiology Score: RAPS * Rapid Emergency Medicine Score: REMS * Modified Early Warning Score: MEWS * Management (0:Conservative; 1:Surgery): management * ED presentation to operation (hours): op_time * End outcome (0:Survival; 1:Death): outcome [PLoS ONE12(9):e0184813.](https://doi.org/10.1371/journal.pone.0184813) ## Regression (Correlation and Plot) We will use the PLOS one dataset to do the analysis. We will first perform a correlation analysis and plot a scatter plot. ```r= # load library library(ggplot2) # for graph library(PerformanceAnalytics) # import pone_data.txt file and name it data data <- read.table("data/pone_data.txt", header = T, sep = "\t") # check the structures of dataset str(data) # correlation cor.test(data$pr, data$rr, method = "pearson") # dotplot ggplot(data=data, aes(x = pr, y = rr)) + geom_point(size = 3, shape= 16) + geom_smooth(method = lm, se = FALSE) ``` ### **Plotting Symbols** ![](https://i.imgur.com/4A1TF7r.png) <br/> ## Regression (Modeling, Univariate Linear Regression) ```r= # build a model model <- lm(rr ~ pr, data = data) summary(model) model1 <- glm(rr ~ pr, data = data, family = gaussian) summary(model1) # predict the rr predictor <- data.frame(pr = c(103, 120)) predict(model1, predictor, type="response") ``` ## Regression (Modeling, Multivariate Linear Regression and Logistic Regression) ```r= # correlation matrix chart.Correlation(data, histogram=TRUE, method = "spearman", pch=19) # build a model (multivariate linear) model2 <- glm(rr ~ pr + bp, data = data, family = gaussian) summary(model2) # build models (multivariate logistic) model3 <- glm(outcome ~ rr + age + bp + RAPS, data = data, family = binomial) summary(model3) model4 <- glm(outcome ~ rr + age + bp + REMS, data = data, family = binomial) summary(model4) model5 <- glm(outcome ~ rr + age + bp + MEWS, data = data, family = binomial) summary(model5) ```