# BioCommons Single-Cell Course
This code loads and processes the seurat object
```
## Load data
pbmc.data <- Read10X(data.dir = "data/pbmc3k/filtered_gene_bc_matrices/hg19/")
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
## Filtering
pbmc$percent.mt <- PercentageFeatureSet(pbmc, pattern = "^MT-")
pbmc_unfiltered <- pbmc
pbmc <- subset(pbmc,subset =nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
## Normalise & Scale
pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 1e4)
pbmc <- FindVariableFeatures(pbmc, selection.method = 'vst', nfeatures = 2000)
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)
## Dimension Reduction
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
pbmc <- RunUMAP(pbmc, dims = 1:10)
```
<br>
<br>
## Challenge: Plot favourite gene
My favourite gene does not appear to be in the dataset
```
"CMYC" %in% rownames(pbmc)
```
But you can check the HGNC id for your favourite gene:
https://www.genecards.org/
```
FeaturePlot(pbmc, features = "MYC")
```
<br>
<br>
## Challenge: The meta.data slot
```
pbmc$samplename <- rep("thisSample", dim(pbmc@meta.data)[1])
pbmc$samplename <- "thisSample"
```
<br>
<br>
## Challenge: Filter the genes
https://swbioinf.github.io/scRNAseqInR_Doco/qc.html#challenge-filter-the-cells
Different number of cells lost depending on filtering
```
pbmc_qctest <- subset(pbmc_unfiltered,
subset = nFeature_RNA > 200 &
nFeature_RNA < 2500 & percent.mt < 8 )
ncol(pbmc_unfiltered)-ncol(pbmc)
ncol(pbmc_unfiltered)-ncol(pbmc_qctest)
```
Compare plots before and after filtering
```
plot1 <- FeatureScatter(pbmc_unfiltered, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(pbmc_unfiltered, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
plot3 <- FeatureScatter(pbmc_unfiltered, feature1 = "nFeature_RNA", feature2 = "percent.mt")
plot5 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot6 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
## Compare before & after!
plot1 + plot5
plot2 + plot6
```
<br>
<br>
## Challenge: PC genes
We want to plot the genes that contribute the most to PC1 or PC2 (i.e. have the largest weighting/loading), which are the genes here:
```
VizDimLoadings(pbmc, dims = 1:2, reduction = 'pca')
```
Instead of manually entering the gene names we can access the gene loadings through the Seurat object
```
names(attributes(pbmc))
names(pbmc)
pbmc$pca
names(attributes(pbmc$pca))
top_PC1_genes <- sort(abs(pbmc$pca@feature.loadings[,1]), decreasing = TRUE)[1:6]
top_PC2_genes <- sort(abs(pbmc$pca@feature.loadings[,2]), decreasing = TRUE)[1:6]
FeaturePlot(pbmc, reduction="pca", features = names(top_PC1_genes))
FeaturePlot(pbmc, reduction="pca", features = names(top_PC2_genes))
```
<br>
<br>
## Challenge: UMAP
Here, we wanted to see the impact of changing the UMAP parametres. We can investigate this using a function instead of saving a new Seurat object for each run.
```
umapFunction <- function(sce_x, seed_x){
sce_x <- RunUMAP(sce_x, dims = 1:10, seed=seed_x)
plot_x <- DimPlot(sce_x,reduction="umap")+labs(title="UMAP")
return(plot_x)
}
umapFunction(pbmc,0.1)
umapFunction(pbmc,50)
umapFunction(pbmc,11)
```
This blog post may help you better understand UMAP and tSNE:
https://pair-code.github.io/understanding-umap/
<br>
<br>
## Challenge: Try different cluster settings
Again, we wrote a function to change the parametres used in clustering.
```
clustFunction <- function(seurat_x,dims_n,k_n, res_x){
seurat_x <- FindNeighbors(seurat_x, dims = 1:dims_n, k.param =k_n)
seurat_x <- FindClusters(seurat_x, resolution =res_x)
return(DimPlot(seurat_x,reduction="umap"))
}
```
```
library(ggplot2)
clust_plot_A <- DimPlot(pbmc,reduction="umap", group.by="RNA_snn_res.0.5")
clust_plot_B <- clustFunction(pbmc,7,20,5)+labs(title = "B")
clust_plot_C <- clustFunction(pbmc,10,30,0.8)+labs(title = "C")
clust_plot_D <- clustFunction(pbmc,5,30,0.1)+labs(title = "D")
(clust_plot_A + clust_plot_B) / (clust_plot_C + clust_plot_D)
```
Alternative solution posted by another tutor, from the slack:
```
pbmc <- FindClusters(pbmc, resolution = c(0.5, 0.1,0.2,0.3,0.4,0.8,1,1.2,5))
DimPlot(pbmc, reduction = "umap", group.by =c("RNA_snn_res.0.5","RNA_snn_res.0.1", "RNA_snn_res.1" ,"RNA_snn_res.5"))
```
<br>
<br>
## Additional things
To learn more about the raw files for 10X, droplet-based scRNA-seq data:
https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/matrices
Example of downloadable single-cell data:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE161529
The authors of this paper/dataset also wrote a paper about their QC:
https://www.nature.com/articles/s41597-022-01236-2
And provided the code:
https://github.com/yunshun/HumanBreast10X/tree/main/RCode
And the Seurat data objects on figshare:
https://doi.org/10.6084/m9.figshare.17058077
If you're interested in computational biology and aren't yet a member, I would recommend getting involved with:
* Australilan Bioinformatics And Computational Biology Society (ABACBS)
https://www.abacbs.org/
* Or COMBINE, the student subcommittee of ABACBS
https://www.combine.org.au/