# Research Group Discussion NO. 11 ###### tags: `R` `Statistics` `Survival Analysis` `Kaplan-Meier plot` `The Cox regression model` ## Dataset The **lung dataset** is available from the survival package in R. Data from The North Central Cancer Treatment Group. **Attribute Information:** * inst: Institution code * time: Survival time in days (follow up time) * status: censoring status 1=censored, 2=dead * age: Age in years * sex: Male=1 Female=2 * ph.ecog: ECOG performance score (0=good 5=dead) * ph.karno: Karnofsky performance score (bad=0-good=100) rated by physician * pat.karno: Karnofsky performance score as rated by patient * meal.cal: Calories consumed at meals * wt.loss: Weight loss in last six months ## Survival Analysis We’ll use two R packages: * survival for computing survival analyses * survminer for summarizing and visualizing the results of survival analysis ```r= # Install the packages install.packages(c("survival", "survminer")) # Load the packages library("survival") library("survminer") # Load the lung data data("lung") ``` ### Kaplan-Meier Method ```r= ########## Kaplan-Meier Method ############ # Creates a survival object for making a model formula. Surv(lung$time, lung$status) # Generate the parameters (create a model) survfit(Surv(time, status) ~ 1, data = lung) fit <- survfit(Surv(time, status) ~ 1, data = lung) fit # plot the graph ggsurvplot( fit, xlab = "Days", ylab = "Overall survival probability" ) # Change color, linetype by strata, risk.table color by strata ggsurvplot( fit, pval = TRUE, conf.int = TRUE, risk.table = TRUE, # Add risk table surv.median.line = "hv", # Specify median survival ggtheme = theme_bw(), # Change ggplot2 theme palette = c("#2E9FDF") ) # Use summary to estimating 1 year survival summary(survfit(Surv(time, status) ~ 1, data = lung), times = 365) # Comparing survival times between groups fit2 <- survfit(Surv(time, status) ~ sex, data = lung) ggsurvplot( fit2, pval = TRUE, conf.int = TRUE, risk.table = TRUE, # Add risk table risk.table.col = "strata", # Change risk table color by groups linetype = "strata", # Change line type by groups surv.median.line = "hv", # Specify median survival ggtheme = theme_bw(), # Change ggplot2 theme palette = c("#E7B800", "#2E9FDF") ) # Different way to plot the graph ggsurvplot( fit2, pval = TRUE, conf.int = TRUE, risk.table = TRUE, # Add risk table risk.table.col = "strata", # Change risk table color by groups linetype = "strata", # Change line type by groups surv.median.line = "hv", # Specify median survival ggtheme = theme_bw(), # Change ggplot2 theme palette = c("#E7B800", "#2E9FDF"), fun = "cumhaz" # or "event" ) # Log-Rank test comparing survival curves survdiff(Surv(time, status) ~ sex, data = lung) # extracting p-value from a survdiff object diff_sex <- survdiff(Surv(time, status) ~ sex, data = lung) p_value <- 1 - pchisq(diff_sex$chisq, length(diff_sex$n) - 1) p_value ``` ### Cox Regression Model ```r= ############# The Cox regression model ################ # create a mode (univariate) cox_fit <- coxph(Surv(time, status) ~ sex, data = lung) cox_fit summary(cox_fit) # Hazard ratios: The exponentiated coefficients (exp(coef) = exp(-0.53) = 0.59), # Confidence intervals of the hazard ratios. # being female (sex=2) reduces the hazard by a factor of 0.59 # Being female is associated with good prognostic. # create a mode (multivariate) res.cox2 <- coxph(Surv(time, status) ~ sex + age, data = lung) # plot the graph ggforest(res.cox2) ```