BioCommons Single-Cell Course

# BioCommons Single-Cell Course This code loads and processes the seurat object ``` ## Load data pbmc.data <- Read10X(data.dir = "data/pbmc3k/filtered_gene_bc_matrices/hg19/") pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200) ## Filtering pbmc$percent.mt <- PercentageFeatureSet(pbmc, pattern = "^MT-") pbmc_unfiltered <- pbmc pbmc <- subset(pbmc,subset =nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5) ## Normalise & Scale pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 1e4) pbmc <- FindVariableFeatures(pbmc, selection.method = 'vst', nfeatures = 2000) all.genes <- rownames(pbmc) pbmc <- ScaleData(pbmc, features = all.genes) ## Dimension Reduction pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc)) pbmc <- RunUMAP(pbmc, dims = 1:10) ``` ## Challenge: Plot favourite gene My favourite gene does not appear to be in the dataset ``` "CMYC" %in% rownames(pbmc) ``` But you can check the HGNC id for your favourite gene: https://www.genecards.org/ ``` FeaturePlot(pbmc, features = "MYC") ``` ## Challenge: The meta.data slot ``` pbmc$samplename <- rep("thisSample", dim(pbmc@meta.data)[1]) pbmc$samplename <- "thisSample" ``` ## Challenge: Filter the genes https://swbioinf.github.io/scRNAseqInR_Doco/qc.html#challenge-filter-the-cells Different number of cells lost depending on filtering ``` pbmc_qctest <- subset(pbmc_unfiltered, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 8 ) ncol(pbmc_unfiltered)-ncol(pbmc) ncol(pbmc_unfiltered)-ncol(pbmc_qctest) ``` Compare plots before and after filtering ``` plot1 <- FeatureScatter(pbmc_unfiltered, feature1 = "nCount_RNA", feature2 = "percent.mt") plot2 <- FeatureScatter(pbmc_unfiltered, feature1 = "nCount_RNA", feature2 = "nFeature_RNA") plot3 <- FeatureScatter(pbmc_unfiltered, feature1 = "nFeature_RNA", feature2 = "percent.mt") plot5 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.mt") plot6 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "nFeature_RNA") ## Compare before & after! plot1 + plot5 plot2 + plot6 ``` ## Challenge: PC genes We want to plot the genes that contribute the most to PC1 or PC2 (i.e. have the largest weighting/loading), which are the genes here: ``` VizDimLoadings(pbmc, dims = 1:2, reduction = 'pca') ``` Instead of manually entering the gene names we can access the gene loadings through the Seurat object ``` names(attributes(pbmc)) names(pbmc) pbmc$pca names(attributes(pbmc$pca)) top_PC1_genes <- sort(abs(pbmc$pca@feature.loadings[,1]), decreasing = TRUE)[1:6] top_PC2_genes <- sort(abs(pbmc$pca@feature.loadings[,2]), decreasing = TRUE)[1:6] FeaturePlot(pbmc, reduction="pca", features = names(top_PC1_genes)) FeaturePlot(pbmc, reduction="pca", features = names(top_PC2_genes)) ``` ## Challenge: UMAP Here, we wanted to see the impact of changing the UMAP parametres. We can investigate this using a function instead of saving a new Seurat object for each run. ``` umapFunction <- function(sce_x, seed_x){ sce_x <- RunUMAP(sce_x, dims = 1:10, seed=seed_x) plot_x <- DimPlot(sce_x,reduction="umap")+labs(title="UMAP") return(plot_x) } umapFunction(pbmc,0.1) umapFunction(pbmc,50) umapFunction(pbmc,11) ``` This blog post may help you better understand UMAP and tSNE: https://pair-code.github.io/understanding-umap/ ## Challenge: Try different cluster settings Again, we wrote a function to change the parametres used in clustering. ``` clustFunction <- function(seurat_x,dims_n,k_n, res_x){ seurat_x <- FindNeighbors(seurat_x, dims = 1:dims_n, k.param =k_n) seurat_x <- FindClusters(seurat_x, resolution =res_x) return(DimPlot(seurat_x,reduction="umap")) } ``` ``` library(ggplot2) clust_plot_A <- DimPlot(pbmc,reduction="umap", group.by="RNA_snn_res.0.5") clust_plot_B <- clustFunction(pbmc,7,20,5)+labs(title = "B") clust_plot_C <- clustFunction(pbmc,10,30,0.8)+labs(title = "C") clust_plot_D <- clustFunction(pbmc,5,30,0.1)+labs(title = "D") (clust_plot_A + clust_plot_B) / (clust_plot_C + clust_plot_D) ``` Alternative solution posted by another tutor, from the slack: ``` pbmc <- FindClusters(pbmc, resolution = c(0.5, 0.1,0.2,0.3,0.4,0.8,1,1.2,5)) DimPlot(pbmc, reduction = "umap", group.by =c("RNA_snn_res.0.5","RNA_snn_res.0.1", "RNA_snn_res.1" ,"RNA_snn_res.5")) ``` ## Additional things To learn more about the raw files for 10X, droplet-based scRNA-seq data: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/matrices Example of downloadable single-cell data: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE161529 The authors of this paper/dataset also wrote a paper about their QC: https://www.nature.com/articles/s41597-022-01236-2 And provided the code: https://github.com/yunshun/HumanBreast10X/tree/main/RCode And the Seurat data objects on figshare: https://doi.org/10.6084/m9.figshare.17058077 If you're interested in computational biology and aren't yet a member, I would recommend getting involved with: * Australilan Bioinformatics And Computational Biology Society (ABACBS) https://www.abacbs.org/ * Or COMBINE, the student subcommittee of ABACBS https://www.combine.org.au/