changed 7 years ago
Linked with GitHub

Eᔕᖇs ᗯIᑎTEᖇ ᗯOᖇKᔕᕼOᑭ ☔

𝔻𝕒𝕪 𝟛 (𝟘𝟜.𝟙𝟚.𝟚𝟘𝟙𝟠) - 𝔻𝕒𝕪 𝟜 (𝟘𝟝.𝟙𝟚.𝟚𝟘𝟙𝟠)
𝕋𝕖𝕒𝕔𝕙𝕖𝕣: Maja Kuzman
University of Zagreb
Faculty of Science, Division of Biology, Bioinformatics group

The basics


Introduction

  • R
  • R markdown, R notebook
Vectors:
  • subsetting, recycling
  • Vector types: numeric, logical, character, complex
  • Some operations : +,-, /, *, %/%, %%, ==, !=
  • Some functions: any, all, example, help, ?, ??, sum, sd, mean, factorial, abs
Matrices:
  • subsetting, basic operations
Functions:
  • Basic function format, environments, return, recursions
Flow control in R:
  • if, if-else, ifelse, for, while, break, next
Lists:
  • basic operations; accessing elements, list structure
  • lapply, sapply, tapply, by, do.call
  • as parameter
Factors:
  • structure, addition of elements, addition of new levels
  • Conversion to numeric

Data manipulation


Data frames

  • read in: read.table, read.csv, read.tsv
  • basics - subset() and []
  • merge, order, unique

Package: dplyr

  • filter, slice, select
  • %>%, grouping
  • summarise, arrange, lead, lag, n, count
  • mutate, mutate_all, transmute

Package: data.table

  • i: selecting rows
  • j: selecting columns, returning list list() / .()
  • by: by
  • operations on columns
  • Adding new columns
  • .N, .I, .GRP
  • keys
  • .SD, .SDcols
  • {}: supressing intermediate output
  • merge
  • roll
  • foverlaps

Regular expressions

  • grep
  • Special characters: ^$ \ . + * ?
  • Special brackets: [], (), {}. \1
  • stringr package


Package: tidyr

What is clean data?
  • rows = observations
  • columns = attributes
How to clean up messy data:
  • Spread: Each column single attribute
  • Gather: Column headers are variable names
  • Sepatrate: Busy columns
  • Merge multiple tables (baseR, dplyr, data.table)

Data visualization


  • Plots in base R VS ggplot examples

Some useful graphs:

  • Scatter plot
  • Q-Q plot
  • Histograms
  • Density plots
  • correlation matrix (package:corrplot)
  • Heatmap (package: pheatmap)

Package: ggplot2

  • Basics: ggplot(dataframe, aes(x,y))
  • Different layer examples: geom_point(), geom_histogram(), geom_smooth(), geom_bar(), geom_boxplot(), geom_density(),
  • Groupings: group, fill, facets
  • Other: titles, axes, legends, colors, themes

Interactive graphs

  • Interactive graphs examples with ggplotly
  • shiny

Advanced topics: Bioconductor


Package: Biostrings:

  • BSgenome
  • Get sequence by GRanges - getSeq
  • Useful functions: complement, reverseComplement, reverse,c
  • subseq, Views
  • alphabetFrequency, mono, di, trinucleotideFrequency, oligonucleotideFrequency
  • translate, consensusMatrix, matchPattern, PairwiseAlignment

Package: shortRead

  • Handling FastQ reads
  • Handling alignments

Package: biomaRt

  • Choosing mart (version, type, organism) useEnsembl
  • Choosing dataset - listDatasets, useDataset, listAttributes
  • Getting the dataset - getBM

Package: GenomicRanges and IRanges:

  • Defining IRanges and accessing elements
  • Some functions: reduce, disjoin, findOverlaps, countOverlaps, coverage
  • Defining GRanges and applying functions on them

☃ ☃ ❄ ❄ In your free time: ❄ ❄ ☃ ☃

Rpic
Emoji data science in R: A tutorial By Hamdan Azhar

Select a repo