# Gona Berisha und Miriam Ermer - TNF
## **Exercise 1 - Pubmed**
### *1.* *How many papers are published per year?*
> * 1.625.383 (2020)
> * 1.407.938 (2019)
> * 1.345.385 (2018)
> * ca. 1.459.568,667 per year
>Source: PubMed
### *2. How many journals are there?*
> Approximately 30.000 records are included in the PubMed journal list which is updated daily and includes all MEDLINE® titles as well as other non-MEDLINE titles in PubMed.
>Source: PubMed
### *3. How dose Pubmed deal with first name and surname?*
> * Enter the author’s last name and initials without punctuation in the search box, and click Search
> * If you only know the author’s last name, use the author search field tag [au], e.g., brody[au]
> * Names entered using either the lastname+initials format (e.g., smith ja) or the full name format (john a smith) and no search tag are searched as authors as well as collaborators
> * Enter a full author name in natural or inverted order, e.g., julia s wong or wong julia s.
>Source: PubMed
### *4. What can be found under "Advanced"?* *Which search fields are available here? How do you combine the search terms?*

* makes it easier to conduct more elaborate searches
* helps speed up some aspects of searching
* Search Fields: Author, Date, Volume, Issue, Title...
* Add fields with AND (e.g. author ==AND== protein name ==AND== year)
## **Exercise 2 - Your database**
*Search for alternative names, gene name etc. for your protein/gen.Create your own 'database' according to the following scheme and extend this table according to your needs.*
| Gene name (human) | Alternative gene names (human) | Protein name (human) | Gene name (all species) | Protein name (all species) |
|:--------------------------------------------------:|:-----------------:|:---------------------:|:-----------------------:|:--------------------------:|
| TNF | DIF; TNFA; TNFSF2; TNLG1F; TNF-alpha; TNFBR; TNFR2 | tumor necrosis factor | TNFSF2; TNFA | tumor necrosis factor |
## **Exercise 3 - Research your Gene/Protein**
### *1. How many papers are there on your protein/gene and when was the first one published? Was there a peak, was the protein "hip" for a while?*
The first published paper in PubMed is from the year 1975. A total of 187.384 papers have been published on PubMed about TNF. A peak in the amount of publications could be observed in the year 2020.

### *2. Which author has been working with the protein for a long time and is perhaps a luminary/authority?*
### *3. Are there reviews of your protein in review articles?*
Yes, there are 13.579 reviews about the gene TNF. And based on the high number of review articles, it can be stated that many people are working on this specific topic.
### *4. How do you find out whether a journal is considered "good" or "obscure"?*
You can find out if a journal is considered "good" or "obscure" through the so called impact factor. Thus, it provides an objective basis for the evaluation and characterization of the quality of journals. The use of the impact factor is a reliable method to measure the importance of a journal and therefore represents an adequate tool to compare different journals. The higher the impact factor of a journal is, the higher the rank of the journal in a specific subject category is. As a direct measurement for the quality of a journal you can therefore say that a journal is better than the other because it has a higher impact factor.
### *5. Find out if there are articles about your protein in "Nature", "Cell", "Science", "eLIFE", "Nature Cell Biology", "Current Biology".*
* Nature: 152 articles
* Cell: 107 articles
* Science: 73 articles
* eLIFE: 29 articles
* Nature Cell Biology: 26 articles
* Current Biology: 36 articles
### *6. Which paper on your protein/gene do you have the most references and is it an important paper in the field?*
For example, the impact factor can be used to find out which article has been referenced the most since this factor generally provides information on how often an article of a journal is cited on average per year.
### *7. Compare your results with an alternative literature search database (EuropePMC). Can you find the results again? What is the biggest difference?*
* difference in the number of published articles on TNF: **22.140**
* first published paper in PubMed is from the year **1981**
* peak of articles in the year **2020** (similar to PubMed)
* comparatively, you can easily search in EuropePMC what the most referenced article is (You couldnt do that in PubMed):
> Beg AA, Baltimore D. An essential role for NF-kappaB in preventing TNF-alpha-induced cell death. Science (New York, N.Y.). 1996 Nov;274(5288):782-784. DOI: 10.1126/science.274.5288.782. PMID: 8864118.

* difference in the amount of reviews: **1.384**
* differences in number of articles in...
* Nature: **22** articles
* Cell: **30** articles
* Science: **21** articles
* eLIFE: **7** articles
* Nature Cell Biology: **7** articles
* Current Biology: **6** articles
:::info
The biggest difference between PubMed und EuropePMC is the amount of published articles. This in turn has an impact on all the other data/numbers.
:::
## Exercise 4 - Introduction to **TNF**
### General
* Tumor necrosis factor
* inflammatory marker (https://pubmed.ncbi.nlm.nih.gov/34808003/) (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2631033/)
(https://www.nature.com/articles/s41584-021-00639-6)
* has a major protective role in infectious diseases
* TNF alpha has been identified as an important mediator of chronic immuno-inflammatory diseases (https://pubmed.ncbi.nlm.nih.gov/7660686/)
* TNF-alpha is a pleiotropic cytokine (https://pubmed.ncbi.nlm.nih.gov/34795583/)(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2631033/)
* produced under various inflammatory conditions by multiple cell types
* a soluble form (acts as a ligand) and a membrane- bound form (can act as either a ligand or a receptor)
* TNF might promote the extra- adrenal production of immunoregulatory glucocorticoids34 and inhibit haematopoiesis
### Structure
### Functions
* plays important roles in diverse cellular events such as cell survival, proliferation, differentiation, and death (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2631033/)
* TNF plays an important role in the inflammatory acute phase response and in the activation of the immune system (https://pubmed.ncbi.nlm.nih.gov/9127637/)
* The pro- inflammatory and anti- inflammatory properties of TNF are largely segregated by the capacity of this cytokine to bind to TNF receptor 1 (TNFR1) and TNFR2,respectively (https://www.nature.com/articles/s41584-021-00639-6)
* TNF can induce multiple downstream signalling pathways
* TNF can either promote or suppress immunity through its differential effects on lymphocytes (Fig. 1). The pro- inflammatory effects of TNF result from the co- stimulation of T cells, mainly via TNFR2.
* strong effect on Treg cells
* TNF also inhibits the differentiation of T helper 17 (TH17) cells by increasing IL-2 production and decreases IL-17 production by conventional T cells and effector Treg cells via the activation of TNFAIP3 (refs70,71). This mechanism might explain the increase in numbers of TH17 cells described in Tnfrsf1a- knockout mice or after treatment with TNF inhibitors in mouse models of RA and psoriasis.

(https://www.nature.com/articles/s41584-021-00639-6)
### Role in diseases
* TNF was detected in the joints of patients with rheumatoid arthritis (RA) (https://www.nature.com/articles/s41584-021-00639-6)
* overexpression of TNF induces autoimmune arthritis
* TNF inhibitors are widely used and have greatly improved the medical care of patients with RA (https://www.nature.com/articles/s41584-021-00639-6)
* One-third of patients with RA have to stop taking these drugs (TNF inhibitors) within the first year because of insufficient efficacy or adverse events (https://www.nature.com/articles/s41584-021-00639-6)
* Some patients do not respond to TNF inhibitors and others develop paradoxical autoimmune exacerbations (https://www.nature.com/articles/s41584-021-00639-6)
* can be explained by the immunoregulatory role of TNF (https://www.nature.com/articles/s41584-021-00639-6)
* Most studies that have analysed both blood and synovial fluid of patients with RA concluded that the proportion of Treg cells was higher in synovial fluid than in blood and remained stable over time in individual patients.
* The synovial fluid of patients with RA contains high amounts of IL-6, TNF and IFNγ, low levels of IL-17A, IL-10 or IL-13, and does not contain IL-1
* Which of these factors is responsible for the increased proportion and activation of synovial Treg cells remains unclear. However, IL-6 is not likely to be involved because this cytokine (which is produced by joint fibroblasts) induces the transdifferentiation of Treg cells into highly pathogenic TH17 cells in a mouse model of autoimmune arthritis, a phenomenon that might also take place in patients with RA. IL-6 also induced the proteasomal degradation of FOXP3 and loss of the suppressive activity of Treg cells. Therefore, the activation and/or expansion of Treg cells in the synovial fluid of patients with RA is likely to be caused by high local levels of TNF.
* Early work showed that Treg cells from patients with RA obtained before the initiation of TNF inhibitor therapy had a poor capacity to suppress cytokine production by conventional T cells and that the suppressive activity of these Treg cells was restored following anti- TNF treatment.(https://www.nature.com/articles/s41584-021-00639-6)
* In the RA immune-environment, TNF-α has been shown to have an influential and extensive but as yet poorly understood effect on Treg function in vivo, and undoubtably an important role in the treatment of RA. Interestingly, the high levels of TNF-α found in RA patients appear to interfere with the mechanisms controlling the suppressive function of Tregs. [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6410649/]
* Rheumatoid arthritis (RA) is an autoimmune disorder that manifests itself as a chronic inflammation of the lining of the joints, with significant morbidity and mortality rates if left untreated [1]. RA is characterized by synovial inflammation and hyperplasia (swelling), autoantibody production (rheumatoid factor (RF) and anti-citrullinated protein antibody (AC-PA)), cartilage and bone destruction, and systemic features, including cardiovascular, pulmonary, psychological, and skeletal disorders [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6410649/]
* High levels of TNF-α are found in both the serum and synovial fluid of RA patients, so this may be one of the factors leading to impaired Tregs function
* These studies suggest that TNF-α plays an important role in the inhibition of FoxP3+ Treg suppressive function, particularly in suppressing inflammation.
### Introduction
#### Big Picture Sentences
1. What happens if your immune system is working against you and thus your own body is attacking yourself?
2. ==With a prevalence rate of approximately 1% the autoimmune disease rheumatoid arthritis is affecting the lives of a lot of people worldwide.==
3. ==Imagine having hands, but not being able to hold a simple glass of water.==
4. If rheumatoid arthritis is left untreated, the great grim reaper rips you out of your life faster than you might think.
5. "Your body is a reflection of your lifestyle." However, the autoimmune disorder rheumatoid arthritis does not spare anyone, not even the most seemingly healthy people like the Australian Open winner Caroline Wozniacki.
transcript table
Gene ID: 7124
Transcript: NM_000594.4
Length (nt): 1678
Protein: NP_000585.2
Length (aa): 233
Organism:Homo sapiens
Location: 6p21.33
Exon count: 4, Coding exons: 4
This gene has 1 transcript (splice variant), 1 gene allele, 246 orthologues, 8 paralogues and is associated with 4 phenotypes.





```r=library(tidyverse)
#Exercise 1
library(tidyverse)
library(ggplot2)
library(rmarkdown)
library(magrittr)
library(ggpol)
library(scales)
library(ggpubr)
library(gridExtra)
Cancer<-read_tsv("Cancertyp.tsv")
write_tsv(Cancer,"Cancertyp.tsv", row.names=F, na="")
view(Cancer)
#Standardized incidence rates from cancer (mirrored)
ggplot(Cancer, aes(x=cancer_type, y=incidence)) +
geom_bar(data=Cancer[Cancer$sex=="male",], aes(y=incidence, fill=sex), stat="identity") +
geom_bar(data=Cancer[Cancer$sex=="female",], aes(y=-incidence, fill=sex), stat="identity")+
geom_hline(yintercept=0, colour="white", lwd=1)+
coord_flip(ylim=c(-120,120))+
labs(title = "Standardized incidence rates from cancer",
subtitle = "(number of cases due to cancer per 100.000 inhabitants in Germany, 2016)")+
theme_light()+
labs(caption = "
Source: R. Koch-Institute, „Cancer in Germany | 2015/2016“, p. 163.")+
labs(y="incidence/100.000 population", x="cancer type")+
scale_fill_manual(values = c("#EC9BB0", "#9BB8ED"))+
theme(axis.text=element_text(size=10, family="Calibri"),
axis.title=element_text(size=10,face="bold", family="Calibri"),
text=element_text(family="Calibri", size=10),
legend.position="right", aspect.ratio=1,
legend.title = element_text(family="Calibri", size=10, face="bold"),
legend.text = element_text(size=10, family="Calibri"),
plot.caption.position = "panel",
plot.caption = element_text(hjust = 0, family="Calibri", size=8, color = "grey"))+
theme(plot.title=element_text(family="Calibri", size=10, face="bold"))+
scale_y_continuous(limits = c(-120, 120),
breaks = seq(-120,120,20),
labels = abs(seq(-120, 120, 20)))
#Standardized incidence rates from cancer (stacked)
ggplot(Cancer)+
geom_bar(aes(y = cancer_type,
x= incidence,
fill = sex),
stat="identity")+
scale_fill_manual(values = c("female"= "#EC9BB0", "male"="#9BB8ED"))+
theme_light()+
xlab("incidence/100,000 population")+
ylab("cancer type")+
labs(caption = "
Source: R. Koch-Institute, „Cancer in Germany | 2015/2016“, p. 163.")+
theme(axis.text=element_text(size=10, family="Calibri"),
axis.title=element_text(size=10,face="bold", family="Calibri"),
text=element_text(family="Calibri", size=10),
legend.position="right", aspect.ratio=1,
legend.title = element_text(family="Calibri", size=10, face="bold"),
legend.text = element_text(size=10, family="Calibri"),
plot.caption.position = "panel",
plot.caption = element_text(hjust = 0, family="Calibri", size=8, color = "grey"))+
labs(title = "Standardized incidence rates from cancer",
subtitle = "(number of cases due to cancer per 100.000 inhabitants in Germany, 2016)")+
theme(plot.title=element_text(family="Calibri", size=10, face="bold"))+
scale_x_continuous(expand = c(0,0),
limits = c(0,120))
#Exercise 2
library(tidyverse)
library(ggplot2)
library(rmarkdown)
lung<-read_tsv("Lung.tsv")
write_tsv(lung,"Lung.tsv", row.names=F, na="")
view(lung)
#Distribution of cancer load
ggplot(lung)+
geom_histogram(aes(x = Cancer_load,
fill=sex,
bin=1))+
scale_y_continuous(limits = c(0, 40))+
scale_fill_manual(values = c("female"= "#EC9BB0", "male"="#9BB8ED"))+
theme_light()+
xlab("cancer load")+
ylab("number of patients")+
theme(axis.text=element_text(size=10, family="Calibri"),
axis.title=element_text(size=10,face="bold", family="Calibri"),
text=element_text(family="Calibri", size=10),
legend.position="none",
panel.spacing = unit(1, "cm"),
plot.margin = unit(c(0.7,0.7,0.7,0.7),"cm"),
plot.caption.position = "panel",
plot.caption = element_text(hjust = 0, family="Calibri", size=8, color = "grey"))+
labs(caption = "
Source: Dr. Helena Jambor")+
ggtitle("Distribution of cancer load")+
theme(plot.title=element_text(family="Calibri", size=10, face="bold"))+
facet_grid(cols=vars(sex))+
scale_x_continuous(expand = c(0,0),
limits = c(0,160))+
scale_y_continuous(expand = c(0,0),
limits = c(0,40))
#Distribution of ECOG
ggplot(lung)+
geom_histogram(aes(x = ECOG,
fill=sex))+
scale_fill_manual(values = c("female"= "#EC9BB0", "male"="#9BB8ED"))+
theme_light()+
xlab("ECOG")+
ylab("number of patients")+
theme(axis.text=element_text(size=10, family="Calibri"),
axis.title=element_text(size=10,face="bold", family="Calibri"),
text=element_text(family="Calibri", size=10),
legend.position="none",
panel.spacing = unit(0.7, "cm"),
plot.margin = unit(c(0.7,0.7,0.7,0.7),"cm"),
plot.caption.position = "panel",
plot.caption = element_text(hjust = 0, family="Calibri", size=8, color = "grey"))+
labs(caption = "
Source: Dr. Helena Jambor")+
ggtitle("Distribution of ECOG")+
theme(plot.title=element_text(family="Calibri", size=10, face="bold"))+
facet_grid(~sex)+
scale_x_continuous(expand = c(0,0),
limits = c(0,5))+
scale_y_continuous(expand = c(0,0),
limits = c(0,110))
#Correlation between ECOG and cancer load
ggplot(lung)+
geom_point(aes(x = Cancer_load,
y= ECOG,
fill=sex,
color=sex))+
scale_color_manual(values = c("female"= "#EC9BB0", "male"="#9BB8ED"))+
theme_light()+
xlab("cancer load")+
ylab("ECOG")+
theme(axis.text=element_text(size=10, family="Calibri"),
axis.title=element_text(size=10,face="bold", family="Calibri"),
text=element_text(family="Calibri", size=10),
legend.position="right", aspect.ratio=1,
legend.title = element_text(family="Calibri", size=10, face="bold"),
legend.text = element_text(size=10, family="Calibri"),
plot.caption.position = "panel",
plot.caption = element_text(hjust = 0, family="Calibri", size=8, color = "grey"))+
labs(caption = "
Source: Dr. Helena Jambor")+
ggtitle("Correlation between ECOG and cancer load")+
theme(plot.title=element_text(family="Calibri", size=10, face="bold"))+
scale_x_continuous(expand = c(0,0),
limits = c(0,155))+
scale_y_continuous(expand = c(0,0),
limits = c(0,5))
#Exercise 3
library(tidyverse)
library(ggplot2)
library(rmarkdown)
library(survival)
library(ggfortify)
library(survminer)
library(data.table)
library(patchwork)
view(gbsg) #one external datasheet opens in a new tab and shows the complete table with the row data
head(cancer) #Returns the first part of a the table
tail(cancer) #Returns the last part of a the table
str(cancer) #Compactly display the internal structure of the table (if a column is numeric or character/text,
#how many rows and columns are in the table)
summary(cancer) #shows the median, mead, min and max value of each column except sex, also shows the 1st and 3rd quartile)
AB<-survfit(Surv(cancer$time,cancer$status==2)~1)
plot(AB)
survfit(Surv(cancer$time,cancer$status == 2)~1) #n events median 0.95LCL 0.95UCL
#[1,] 228 165 310 285 363
#Comparison from lung und breast cancer patients in survival
stlung<-survfit(Surv(cancer$time,cancer$status==2)~1)
plot1<-autoplot(stlung,
censor.shape = 20,
censor.size = 1,
surv.colour = "#9BB8ED",
pval=TRUE)+
theme_light()+
xlab("time [days]")+
ylab("survival probability")+
labs(caption = "
Source: Loprinzi CL. Laurie JA. Wieand HS. Krook JE. Novotny PJ. Kugler JW. Bartel J. Law M. Bateman M. Klatt NE. et al.
Prospective evaluation of prognostic variables from patient-completed questionnaires. North Central Cancer
Treatment Group. Journal of Clinical Oncology. 12(3):601-7, 1994.")+
theme(axis.text=element_text(size=10, family="Calibri"),
axis.title=element_text(size=10,face="bold", family="Calibri"),
text=element_text(family="Calibri", size=10),
legend.position="right", aspect.ratio=1,
legend.title = element_text(family="Calibri", size=10, face="bold"),
legend.text = element_text(size=10, family="Calibri"),
plot.caption.position = "panel",
plot.caption = element_text(hjust = 0, family="Calibri", size=8, color = "grey"))+
ggtitle("Kaplan-Meier Survival Analysis for lung cancer patients")+
theme(plot.title=element_text(family="Calibri", size=10, face="bold"))+
scale_x_continuous(expand = c(0,0),
limits = c(0,1050))+
scale_y_continuous(expand = c(0,0),
limits = c(0,1.1))+
geom_hline(yintercept=0.5,linetype='dashed')+
geom_vline(xintercept=310, linetype='dashed')
stbrust<-survfit(Surv(gbsg$rfstime,gbsg$status==1)~0)
plot2<-autoplot(stbrust,
surv.colour = "#9BB8ED",
censor.shape = 20,
censor.size = 1)+
theme_light()+
xlab("time [days]")+
ylab("survival probability")+
labs(caption = "
Source: Patrick Royston and Douglas Altman, External validation of a Cox prognostic model: principles and methods.
BMC Medical Research Methodology 2013, 13:33")+
theme(axis.text=element_text(size=10, family="Calibri"),
axis.title=element_text(size=10,face="bold", family="Calibri"),
text=element_text(family="Calibri", size=10),
legend.position="right", aspect.ratio=1,
legend.title = element_text(family="Calibri", size=10, face="bold"),
legend.text = element_text(size=10, family="Calibri"),
plot.caption.position = "panel",
plot.caption = element_text(hjust = 0, family="Calibri", size=8, color = "grey"))+
ggtitle("Kaplan-Meier Survival Analysis for breast cancer patients")+
theme(plot.title=element_text(family="Calibri", size=10, face="bold"))+
scale_x_continuous(expand = c(0,0),
limits = c(0,2800))+
scale_y_continuous(expand = c(0,0),
limits = c(0,1.1))+
geom_hline(yintercept=0.5,linetype='dashed')+
geom_vline(xintercept=1809, linetype='dashed')
plot1+plot2
#Comparison from the survival between hormone treatment and no hormone treatment in breast cancer
breastc<-survfit(Surv(rfstime,status)~hormon, data=gbsg)
pvalue<-surv_pvalue(breastc)
autoplot(breastc,
censor.shape = 20,
censor.size = 1)+
theme_light()+
xlab("time [days]")+
ylab("survival probability")+
labs(caption = "
Source: Patrick Royston and Douglas Altman, External validation of a Cox prognostic model: principles and methods. BMC Medical
Research Methodology 2013, 13:33")+
theme(axis.text=element_text(size=10, family="Calibri"),
axis.title=element_text(size=10,face="bold", family="Calibri"),
text=element_text(family="Calibri", size=10),
legend.position="right", aspect.ratio=1,
legend.title = element_text(family="Calibri", size=10, face="bold"),
legend.text = element_text(size=10, family="Calibri"),
plot.caption.position = "panel",
plot.caption = element_text(hjust = 0, family="Calibri", size=8, color = "grey"))+
ggtitle("Kaplan-Meier Survival Analysis for breast cancer patients")+
theme(plot.title=element_text(family="Calibri", size=10, face="bold"))+
scale_fill_manual(values = c("0"= "#EC9BB0", "1"="#9BB8ED"))+
scale_color_manual(values = c("0"= "#EC9BB0", "1"="#9BB8ED"))+
scale_x_continuous(expand = c(0,0),
limits = c(0,2800))+
scale_y_continuous(expand = c(0,0),
limits = c(0,1.1))+
annotate("text", x=750, y=0.25, label= "p = 0.0034", size=4, family="Calibri")+
geom_hline(yintercept=0.5,linetype='dashed')+
geom_vline(xintercept=c(1525,2019), linetype='dashed')
#Comparison from survival between female and male in lung cancer
lungc<-survfit(Surv(time,status)~sex, data=cancer)
surv_pvalue(lungc)
autoplot(lungc,
censor.shape = 20,
censor.size = 1)+
theme_light()+
xlab("time [days]")+
ylab("survival probability")+
labs(caption = "
Source: Loprinzi CL. Laurie JA. Wieand HS. Krook JE. Novotny PJ. Kugler JW. Bartel J. Law M. Bateman M. Klatt NE. et al.
Prospective evaluation of prognostic variables from patient-completed questionnaires. North Central Cancer
Treatment Group. Journal of Clinical Oncology. 12(3):601-7, 1994.")+
theme(axis.text=element_text(size=10, family="Calibri"),
axis.title=element_text(size=10,face="bold", family="Calibri"),
text=element_text(family="Calibri", size=10),
legend.position="right", aspect.ratio=1,
legend.title = element_text(family="Calibri", size=10, face="bold"),
legend.text = element_text(size=10, family="Calibri"),
plot.caption.position = "panel",
plot.caption = element_text(hjust = 0, family="Calibri", size=8, color = "grey"))+
ggtitle("Kaplan-Meier Survival Analysis for lung cancer patients")+
theme(plot.title=element_text(family="Calibri", size=10, face="bold"))+
scale_fill_manual(values = c("1"= "#9BB8ED", "2"="#EC9BB0"))+
scale_color_manual(values = c("1"= "#9BB8ED", "2"="#EC9BB0"))+
scale_x_continuous(expand = c(0,0),
limits = c(0,1050))+
scale_y_continuous(expand = c(0,0),
limits = c(0,1.1))+
geom_hline(yintercept=0.5,linetype='dashed')+
geom_vline(xintercept=c(270,426.5), linetype='dashed')+
annotate("text", x=135, y=0.25, label= "p = 0.0013", size=4, family="Calibri")+
geom_hline(yintercept=0.5,linetype='dashed')
#Exercise 4
library(tidyverse)
library(ggplot2)
library(rmarkdown)
library(survival)
library(ggfortify)
library(survminer)
library(data.table)
library(directlabels)
library(ggrepel)
library(dplyr)
death<-read_tsv("death.tsv")
write_tsv(death,"death.tsv", row.names=F, na="")
view(death)
#Number of cancer deaths by type, World, 2019
death$cancer_type<-factor(death$cancer_type,levels=death$cancer_type[order(death$cancer_deaths)])
ggplot(data = death, aes(x=cancer_type, y=cancer_deaths, fill ="black"))+
geom_bar(stat="identity")+
coord_flip()+
theme_light()+
xlab("cancer type")+
ylab("cancer deaths")+
labs(caption = "
Source: IHME, Global Burden of Disease")+
scale_fill_manual(values=c("#9BB8ED"))+
theme(axis.text=element_text(size=10, family="Calibri"),
axis.title=element_text(size=10,face="bold", family="Calibri"),
text=element_text(family="Calibri", size=10),
legend.position = "none",
plot.caption.position = "panel",
plot.caption = element_text(hjust = 0, family="Calibri", size=8, color = "grey"),
plot.margin = unit(c(0.7,0.7,0.7,0.7),"cm"))+
ggtitle("Number of cancer deaths by type, World, 2019")+
theme(plot.title=element_text(family="Calibri", size=10, face="bold")) +
geom_text(aes(label=cancer_deaths), vjust=0.5, hjust=-0.3, color="black", size=3, family="Calibri")+
scale_y_continuous(expand = c(0,0),
limits = c(0,60000))
#Cancer deaths from 1990 - 2019
time<-read_tsv("cancer_death_over_time.tsv")
write_tsv(time,"cancer_death_over_time.tsv", row.names=F, na="")
view(time)
ggplot(time)+
geom_line(aes(y = lung_cancer,
x= Jahr,
colour = Entity),
stat="identity",
size = 1)+
scale_color_manual(values = c("#fbb4ae","#b3cde3","#ccebc5","#decbe4"), name="continent")+
labs(caption = "
Source: IHME, Global Burden of Disease")+
labs(x="year",
y="cancer deaths",
title = "Cancer deaths from 1990 - 2019")+
theme_light()+
theme(axis.text=element_text(size=10, family="Calibri"),
axis.title=element_text(size=10,face="bold", family="Calibri"),
text=element_text(family="Calibri", size=10),
legend.position="right", aspect.ratio=1,
legend.title = element_text(family="Calibri", size=10, face="bold"),
legend.text = element_text(size=10, family="Calibri"),
plot.caption.position = "panel",
plot.caption = element_text(hjust = 0, family="Calibri", size=8, color = "grey"))+
theme(plot.title=element_text(family="Calibri", size=10, face="bold"))+
scale_x_continuous(expand = c(0,0),
limits = c(1990,2020))+
scale_y_continuous(expand = c(0,0),
limits = c(0,1100000))
#Age-standardized cancer death rate from 1990 - 2019
age<-read_tsv("age.tsv")
write_tsv(age,"age.tsv", row.names=F, na="")
view(age)
ggplot(age)+
geom_line(aes(y = Age_standardized,
x= Year,
colour = Entity),
stat="identity",
size = 1)+
scale_color_manual(values = c("#fbb4ae","#b3cde3","#ccebc5","#decbe4"), name="continent")+
labs(x="year",
y="cancer death rate",
title = "Age-standardized cancer death rate from 1990 - 2019")+
labs(caption = "
Source: Dr. Helena Jambor")+
theme_light()+
theme(axis.text=element_text(size=10, family="Calibri"),
axis.title=element_text(size=10,face="bold", family="Calibri"),
text=element_text(family="Calibri", size=10),
legend.position="right", aspect.ratio=1,
legend.title = element_text(family="Calibri", size=10, face="bold"),
legend.text = element_text(size=10, family="Calibri"),
plot.caption.position = "panel",
plot.caption = element_text(hjust = 0, family="Calibri", size=8, color = "grey"))+
theme(plot.title=element_text(family="Calibri", size=10, face="bold"))+
scale_x_continuous(expand = c(0,0),
limits = c(1990,2020))+
scale_y_continuous(expand = c(0,0),
limits = c(0,180))
#No age-standardized cancer death rate from 1990 - 2019
age<-read_tsv("age.tsv")
write_tsv(age,"age.tsv", row.names=F, na="")
view(age)
ggplot(age)+
geom_line(aes(y = All_Ages_Rate,
x= Year,
colour = Entity),
stat="identity",
size = 1)+
scale_color_manual(values = c("#fbb4ae","#b3cde3","#ccebc5","#decbe4"), name="continent")+
labs(x="year",
y="cancer death rate",
title = "Not age-standardized cancer death rate from 1990 - 2019")+
labs(caption = "
Source: Dr. Helena Jambor")+
theme_light()+
theme(axis.text=element_text(size=10, family="Calibri"),
axis.title=element_text(size=10,face="bold", family="Calibri"),
text=element_text(family="Calibri", size=10),
legend.position="right", aspect.ratio=1,
legend.title = element_text(family="Calibri", size=10, face="bold"),
legend.text = element_text(size=10, family="Calibri"),
plot.caption.position = "panel",
plot.caption = element_text(hjust = 0, family="Calibri", size=8, color = "grey"))+
theme(plot.title=element_text(family="Calibri", size=10, face="bold"))+
scale_x_continuous(expand = c(0,0),
limits = c(1990,2020))+
scale_y_continuous(expand = c(0,0),
limits = c(0,250))
#Comparison from the three ratios in the European Region
age_eu<-read_tsv("age_eu.tsv")
write_tsv(age,"age_eu.tsv", row.names=F, na="")
view(age_eu)
ggplot(data = age_eu, aes(x=Year, y=Value)) +
geom_line(aes(colour=incidence))+
scale_color_manual(values = c("#fbb4ae","#b3cde3","#ccebc5"),
name=" ")+
labs(x="year",
y="relative change [%]",
title = "Change in three measures of cancer mortality, World, 1990 - 2019")+
labs(caption = "
Source: IHME, Global Burden of Disease")+
theme_light()+
theme(axis.text=element_text(size=10, family="Calibri"),
axis.title=element_text(size=10,face="bold", family="Calibri"),
text=element_text(family="Calibri", size=10),
legend.position="right", aspect.ratio=1,
legend.title = element_text(family="Calibri", size=10, face="bold"),
legend.text = element_text(size=10, family="Calibri"),
plot.caption.position = "panel",
plot.caption = element_text(hjust = 0, family="Calibri", size=8, color = "grey"))+
theme(plot.title=element_text(family="Calibri", size=10, face="bold"))+
scale_x_continuous(expand = c(0,0),
limits = c(1990,2020))+
scale_y_continuous(expand = c(0,0),
limits = c(-20,30))+
geom_point(aes(colour=incidence))+
scale_color_manual(values = c("#fbb4ae","#b3cde3","#ccebc5"),
name=" ")
#Exercise 6
library(Seurat)
library(tidyverse)
pbmc.data<-Read10X(data.dir="C:/Users/miria/Documents/Beuth/2.Semester/Bioinformatik")
pbmc<-CreateSeuratObject(counts=pbmc.data,project = "pbmc", min.cells=3, min.features = 200)
pbmc
pbmc[["percent.mt"]]<-PercentageFeatureSet(pbmc, "^MT-")
VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"),
ncol=3,pt.size=0.01)
pbmc
pbmc<-subset(pbmc, subset=nFeature_RNA>200 & nFeature_RNA<2500 & percent.mt<20)
VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"),
ncol=3,pt.size=0.01)
pbmc
pbmc<-NormalizeData(pbmc,normalization.method = "LogNormalize")
pbmc<-FindVariableFeatures(pbmc,nfeatures = 2000)
pbmc<-ScaleData(pbmc)
pbmc<-RunPCA(pbmc,features = VariableFeatures(object=pbmc))
DimPlot(pbmc,reduction="pca")
FeaturePlot(pbmc, features = c("CD3E", "CD19"),reduction = "pca") # geclustert nach Ähnlichkeit der Genexpression
#NK
FeaturePlot(pbmc, features = c("COX6A2", "ZMAT4", "APOC2"),reduction = "pca")
#Monocyte
FeaturePlot(pbmc, features = c("C1QC", "C1QB", "C1QA"),reduction = "pca")
#dendritic cell plasma
FeaturePlot(pbmc, features = c("LRRC26", "TPM2", "ASIP"),reduction = "pca")
#dendritic cell myo
FeaturePlot(pbmc, features = c("MRC1", "CRIP3", "CD1E"),reduction = "pca")
#naive B-cell
FeaturePlot(pbmc, features = c("KCNG1", "GPR17", "APBB2"),reduction = "pca")
#memory B-cell
FeaturePlot(pbmc, features = c("TAS1R3", "ARL14", "APOD"),reduction = "pca")
#basophile
FeaturePlot(pbmc, features = c("HDC", "CPA3", "MS4A3"),reduction = "pca")
#esoinophile
FeaturePlot(pbmc, features = c("HDC", "CEBPE", "ALOX15"),reduction = "pca")
#CD8
FeaturePlot(pbmc, features = c("CFAP97D2", "CD248", "MXRA8"),reduction = "pca")
#CD4
FeaturePlot(pbmc, features = c("FHIT", "ADAM23", "NEFL"),reduction = "pca")
ElbowPlot(pbmc)
pbmc<-FindNeighbors(pbmc,dims=1:10)
pbmc<-FindClusters(pbmc,resolution = 0.2)
pbmc<-RunUMAP(pbmc,dims = 1:10)
DimPlot(pbmc, reduction = "umap")
pbmc<-RenameIdents(pbmc,
"0"= "T cells",
"1"= "T cells",
"5" = "T cells",
"2"= "monocytes/granulocytes",
"3"= "B cells",
"4"= "B cells")
FindMarkers(pbmc,ident.1 = "4", only.pos = TRUE) #geclustert nach Genen, die überhaupt exprimiert werden->
#Results of the identification of the cell types using different cell markers
# 0 = T-cells
# 1 = T-cells
# 2 = monocyten und Granulocyten
# 3 = B-cells (memory and naive), (Plasmacytoid DC)
# 4 = B-cells (memory and naive), (Plasmacytoid DC)
# 5 = T-cells, NK-cells
#Exercise 5
library(tidyverse)
bad<-read_tsv("bad.tsv")
write_tsv(bad,"bad.tsv", row.names=F, na="")
view(bad)
#How much money do you spend on your significant other?
bad$amount_of_money<-factor(bad$amount_of_money,levels=bad$amount_of_money)
ggplot(data = bad, aes(x=amount_of_money, y=relative_amount, fill ="black"))+
geom_bar(stat="identity")+
coord_flip()+
theme_light()+
xlab("amount of money [$]")+
ylab("relative amount [%]")+
labs(caption = "
Source: imgur")+
scale_fill_manual(values=c("#9BB8ED"))+
theme(axis.text=element_text(size=10, family="Calibri"),
axis.title=element_text(size=10,face="bold", family="Calibri"),
text=element_text(family="Calibri", size=10),
legend.position = "none",
plot.caption.position = "panel",
plot.caption = element_text(hjust = 0, family="Calibri", size=8, color = "grey"),
plot.margin = unit(c(0.7,0.7,0.7,0.7),"cm"))+
ggtitle("How much money do you spend on your significant other?")+
theme(plot.title=element_text(family="Calibri", size=10, face="bold")) +
geom_text(aes(label=relative_amount), vjust=0.5, hjust=-0.3, color="black", size=3, family="Calibri")+
scale_y_continuous(expand = c(0,0),
limits = c(0,30))