# Research Group Discussion NO. 11
###### tags: `R` `Statistics` `Survival Analysis` `Kaplan-Meier plot` `The Cox regression model`
## Dataset
The **lung dataset** is available from the survival package in R. Data from The North Central Cancer Treatment Group.
**Attribute Information:**
* inst: Institution code
* time: Survival time in days (follow up time)
* status: censoring status 1=censored, 2=dead
* age: Age in years
* sex: Male=1 Female=2
* ph.ecog: ECOG performance score (0=good 5=dead)
* ph.karno: Karnofsky performance score (bad=0-good=100) rated by physician
* pat.karno: Karnofsky performance score as rated by patient
* meal.cal: Calories consumed at meals
* wt.loss: Weight loss in last six months
## Survival Analysis
We’ll use two R packages:
* survival for computing survival analyses
* survminer for summarizing and visualizing the results of survival analysis
```r=
# Install the packages
install.packages(c("survival", "survminer"))
# Load the packages
library("survival")
library("survminer")
# Load the lung data
data("lung")
```
### Kaplan-Meier Method
```r=
########## Kaplan-Meier Method ############
# Creates a survival object for making a model formula.
Surv(lung$time, lung$status)
# Generate the parameters (create a model)
survfit(Surv(time, status) ~ 1, data = lung)
fit <- survfit(Surv(time, status) ~ 1, data = lung)
fit
# plot the graph
ggsurvplot(
fit,
xlab = "Days",
ylab = "Overall survival probability"
)
# Change color, linetype by strata, risk.table color by strata
ggsurvplot(
fit,
pval = TRUE,
conf.int = TRUE,
risk.table = TRUE, # Add risk table
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#2E9FDF")
)
# Use summary to estimating 1 year survival
summary(survfit(Surv(time, status) ~ 1, data = lung), times = 365)
# Comparing survival times between groups
fit2 <- survfit(Surv(time, status) ~ sex, data = lung)
ggsurvplot(
fit2,
pval = TRUE,
conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF")
)
# Different way to plot the graph
ggsurvplot(
fit2,
pval = TRUE,
conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"),
fun = "cumhaz" # or "event"
)
# Log-Rank test comparing survival curves
survdiff(Surv(time, status) ~ sex, data = lung)
# extracting p-value from a survdiff object
diff_sex <- survdiff(Surv(time, status) ~ sex, data = lung)
p_value <- 1 - pchisq(diff_sex$chisq, length(diff_sex$n) - 1)
p_value
```
### Cox Regression Model
```r=
############# The Cox regression model ################
# create a mode (univariate)
cox_fit <- coxph(Surv(time, status) ~ sex, data = lung)
cox_fit
summary(cox_fit)
# Hazard ratios: The exponentiated coefficients (exp(coef) = exp(-0.53) = 0.59),
# Confidence intervals of the hazard ratios.
# being female (sex=2) reduces the hazard by a factor of 0.59
# Being female is associated with good prognostic.
# create a mode (multivariate)
res.cox2 <- coxph(Surv(time, status) ~ sex + age, data = lung)
# plot the graph
ggforest(res.cox2)
```