owned this note
owned this note
Published
Linked with GitHub
**NBIS Workshop**
# R Foundations for Life Scientists 2020
Welcome to the *R Foundations for Life Scientists* workshop organised by National Bioinformatics Infrastructure, Sweden.
This document is a life record of what we are doing and learning during the workshop. As either a student or a teacher, you are encouraged to write your comments, questions and feedback.
## Useful resources
- **Course website**: https://nbisweden.github.io/workshop-r/2011
- **HackMD (this document):** https://hackmd.io/479oLTV3SwOu_h-UK0sS_g?both
- **Zoom link:** https://uu-se.zoom.us/j/64914582551 (passcode has been sent to you).
:::info
**Note!** Zoom link is active from 09:00-17:00 (GMT+1) every day from Mon, 02 to Fri, 06 Nov 2020.
:::
## Working on the cloud
This entire workshop can be run on a prepared cloud instance. This means no local setup of any kind is required on your system. All you need to do is login through your web browser.
**Cloud login for students**:
http://rcourse.130.239.81.214.xip.io/
username: <part of your email before @>
password: rcourse12345
This is experimental and we hope this works without issues. In case we run into issues, you will use your own system (or if you prefer to work on your own system, you are welcome to do so). This requires installation of [relevant packages](https://nbisweden.github.io/workshop-r/2011/home_precourse.html) and dependencies. If you need assistance, please contact us.
## Introduction
Here, we write a few words about ourselves to get to know each other a bit better.
#### Teachers and Teaching Assistants
* **Marcin Kierczak** - Responsible for the course content, using R for hacking genomes since 2009. Interested in: stats, genetics, digital signal processing, theory of programming, software design and development strategies. Organizer of R Foundations and RaukR Summer School. Uppsala-based.
* **Sebastian DiLorenzo** - R user since 2011. Uppsala based bioinformatician interested in way too many things with not enough time. Interested in: stats, plots, package development, unecessarily nested for-loops and more than I should write here. A huge fan of googling problems and not giving up until page 3. Organizer of R Foundations and RaukR Summer School.
* **Roy Francis** - Responsible for some of the course content, course repo, website and TA for this workshop. Using R for bioinformatics, data wrangling, statistical analyses, graphics and interactive applications. Located in Uppsala.
* **Lucile Soler** - I will be helping as a TA. Interested in: genome annotation, comparative genomic, genomics, as Sebastian I am a fan of googling problems (a skill always good to improve). Bioinformatician based in Uppsala.
* **Dimitris Bampalikis**
* **Per Unneberg**
* **Jakub Orzechowski Westholm** - TA for this course. Using R since 2004, for gene expression, chromatin analysis, machine learning, integration of various -omics data and other things. Located in Stockholm.
* **Mun-Gwan Hong** - A TA for this course. A 'tidyverse' user. Interested in pQTL, affinity proteomics & GWAS. Working in Stockholm.
* **Lokeshwaran Manoharan** - I will be helping in this course as a TA. Very much interested in genomics of non-model organisms and microbes in particular. Located in Lund.
* **Payam Emami** - Bioinformatician at NBIS working on proteomics, metabolomics, big data analysis, statistics and cloud computing. Responsible for cloud infrastructure on this course.
#### Students
* **Marcin Kierczak** - I program since I turned 12, in R since 2009. I teach programming to learn even more and every time, I learn **A LOT** from students and fellow teachers!
* **Luca Torello Pianale** - PhD student from Chalmers working on microbial robustness for the production of biofuels and a passion for plants.
* **Andreas Carlström** - Im a PhD student from the Dept of Biochemistry and Biophysics at Stockholm University. Working on regulation of gene expression in mitochondria using classical biochemistry/molecular biology but also some structural work using cryo-EM. Would like to learn how to use R in order to better analyse data from protein mass spectrometry and for data visualization.
* **Sophie Tronnet** - I´m a Postdoc at Umeå University, working on host pathogens interactions. I´m a beginner in R programming, and programming in general. I would like to learn R for gene expression data handling and representation.
* **Morteza Aslanzadeh** - PhD student from Stockholm university. My research area focuses on gene expression covariances in single cells. For this, I identify sets of genes that covary strongly in their expression, and then I apply an unbiased screening approach to reveal which main regulator causes the covariances for the given set.
## Day 0 - Example
Here, you see how we can ask questions and how an answer may look like.
### Questions and Issues
- [ ] **I wrote `y <- 10`. Is it equivalent to writing `y = 10`?**
No, not really! Although, most of the time it won't make much of a difference, but `<-` is the assignment operator. It gives (assigns) a name to a value while `=` is saying something is equal to something else. You should use `<-` everywhere except function calls, where you use `=`, like: `my_awesome_function(parameter = 42)`. Writing `=` instead of `<-` is also considered a **bad style** and you risk being loughed at by some more experienced R coders.
## Monday - Day 1
### Questions & Issues
- [x] **How do I use Python in RMarkdown?**
You simply write: `{python}` instead `{r}`. You can see Python objects in R as `py$some_python_name`. See more details [here](https://rstudio.github.io/reticulate/#python-in-r-markdown).
**NOTE!** If you are using `renv` in the project you also use `reticulate`, you have to install all python modules before you can use them. For instance, I do need `matplotlib`, so I write in my R chunk:
```
library("reticulate") # load reticulate
path_to_python <- "/usr/bin/python3"
use_python(path_to_python)
renv::use_python(python = path_to_python)
py_install("matplotlib") # install matplotlib
```
- [x] **Why integers like `var <- 1` are stored as double?**
This is the default way R treats numeric variables. Even if integer, they will be treated as double. You need to explicitely cast `as.integer(var)` to store it as integer. Or a shorcut is `var <- 1L`.
- [x] **When should I use explicitly the character vector `(character <- 'abc')`? and when can I use a "random" variable that I choose `(text1 <- 'abc')`**
It does depend on the context of your program. Whenever you assign a value within double `"` or single `'` quote, you explicitly create a character vector. In the example you provided above, both "character" and "text1" are just variable names and they do not affect how the data is stored. If you want to convert a vector of strings that are numbers, like `my_vector <- c(3.14, 2.71)` to a character vector, then you need an explicite type casting `my_new_vector <- as.character(my_vector)`.
It is **not a good idea** to use `character <- 'abc'` because *character* is a reserved keyword. How do you know this? Just type **character** into the console, press enter. If it DOESN'T give an error, then it is already in use. So use some other variable name than *character*.
- [x] **I want to know more about the special symbols like `Inf` and `NaN`.**
Visit [this](https://stat.ethz.ch/R-manual/R-patched/library/base/html/is.finite.html) resource.
- [x] **I want to better understand the three-valued logic implemented in R (TRUE, FALSE, NA).**
You can read more in this [tutorial](https://riptutorial.com/r/example/12423/true-false-and-or-na).
### Feedback
* Video quality was bad.
## Tuesday - Day 2
### Questions & Issues
- [x] **How do I download lecture slides?**
If you want to download the slides as PDF. Install the R package `pagedown`. Then run it like this in R by providing full URL to the slide. `pagedown::chrome_print("https://nbisweden.github.io/workshop-r/2011/slide_r_elements_3.html")`
:::info
**Note!** You need to have google chrome installed for this to work.
:::
- [x] **Is it possible to define the number of columns and rows of the matrix (not equal)? Say, 2 columns and 3 rows or 3 columns and 2 rows?**
Matrices do not have to be square, but all columns have to be of the same length and all rows also. You cannot have one column with 5 rows and another with 7, unless you fill in the first column with `NA` values to match the length of the second.
- [x] **In the truth table, what is the logic behind AND and OR ? I mean, between NA and FALSE, for example, why does it decide to "take" the NA, when using OR, but FALSE when using AND?**
```
> outer(x, x, "&")
<NA> FALSE TRUE
<NA> NA FALSE NA
FALSE FALSE FALSE FALSE
TRUE NA FALSE TRUE
> outer(x, x, "|")
<NA> FALSE TRUE
<NA> NA NA TRUE
FALSE NA FALSE TRUE
TRUE TRUE TRUE TRUE
```
When you use F | T (FALSE OR TRUE), you are asking the question: Is any of the options true? The answer is yes, one of them is true, so you get TRUE. When you use F & T, you are asking if both options are true, and thats wrong. So you get FALSE.
When you do F | NA, you are asking any of the options true? There is not enough information to answer that because there is a missing value, so you get NA.
Usually asking the question in your minds helps to resolve these things that seems to be rather confusing.
:::info
**Common mistake.** This is also a common mistake when we write or speak. We often say *and/or* which from the point of logics is wrong. Enough to say *or*, because:
* 0 OR 0 is 0,
* 0 OR 1 is 1,
* 1 OR 0 is 1, but
* 1 OR 1 is also **1**.
We confuse it with what in logics is called **XOR** or *exclusive or* defined as:
* 0 XOR 0 is 0,
* 0 XOR 1 is 1,
* 1 XOR 0 is 1 but
* 1 XOR 1 is **0**.
:::
- [x] **How order is working here?**
```
>X <- matrix(sample(1:9,size = 9), nrow = 3)
>X
[,1] [,2] [,3]
[1,] 7 9 8
[2,] 1 6 4
[3,] 5 3 2
> order(X[,2])
[1] 3 2 1
```
So order gives you the order in which that vector would be sorted. So, in this example, we are asking to order the second column: 9,6,3. When you sort them, the last position would be first, ie; 3, then 6 which is position 2 and then 9 which is in position 1. The result is 3,2,1.
- [x] **In the output of the code below, I get 8 rows. Why?**
```
expand.grid(height = seq(120, 121),
weight = c('1-50', '51+'),
sex = c("Male","Female"))
```
Because height has 2 possible values, weight also 2 and there are even 2 options for gender. Hence, we have 2 * 2 * 2 = 8 possible combinations of all these values. Expand grid lists all those possible combinations.
- [x] **What does %/% do?**
Performs a normal division but ignores the reminder. A more common usage if you want to round the result is `round(4.2/2)`. It is also called a *modulo division*. Example, 190 minutes have passed since some full hour, how many full hours have passed? `190 %/% 60 = 3` 3 full hours have passed (and 10 minutes).
- [x] **In the S3 class definition slide, It is writthen that "We cannot enforce that numbers will contain numeric values and letters will contain only characters". Possible to describe?**
Yes, what we meant is that in S3 class there is no easy way of making sure that the S3 class object stores the type of data we intended. We can, e.g. make a named list (== S3 class) to store only, say, ages, but there is no automatic type checking. User can put ages as character vector like `c("two", "five")` and the S3 system will not complain. The problem is that if you also have a generic function for this S3 class, it expects numbers and it will crash or do something silly when hiven characters.
In contrast, in S4 class objects you say that you expect a particular type of data for the slot, e.g. numbers and if someone attempts at storing characters in that slot, the S4 class system will return an error immediately.
- [x] **S4 class method concept is not clear. Could you describe briefly? Thanks.**
OK, so let's do the toolbox analogies:
* an S3 class is a list with a name, so a box with a name. You can take such box, label it *shoes* but put marbles into it. The box will not complain. On top of this, you have another box, labelled *clean.shoes* which contains a tool for cleaning shoes (a generic function). Now, if you take *clean.shoes* and take out the content of the *shoes* box, it will work as long as the box really contain the right data type, i.e. shoes. If someone put marbles there, your tool will either crash or be useless.
* an S4 class is a big box labelled *shoes* with two drawers:
- *shoe storage* that also is made so that only shoes fit there (built-in type checking) and
- *shoe maintenance tools* that contains various tools (functions) that will work with your data, like *clean* tool.
So, an S4 class is way better because it makes sure you put into its slots data of right type and it also has the collection of tools (a.k.a. methods) to work with the data stored in this class objects. It is much safer.
- [x] **I understood the logic in the `order()` function: it gives the position of the ordered elements. But I don't understand how the next command below that works `(X[ord,])`?**
```
>X <- matrix(sample(1:9, size = 9), nrow = 3)
>X
[,1] [,2] [,3]
[1,] 7 9 8
[2,] 1 6 4
[3,] 5 3 2
> ord <- order(X[,2])
[1] 3 2 1
> X[ord,]
```
All right! Let's try to dissect it a little:
* Expression `X[ ,2]` extracts the second column of `X`: `[1] 9 6 3`
* `order(X[ ,2])` returns `[1] 3 2 1` because this is the order in which 9, 6 and 3 are in this column.
* after assigning `ord <- order(X[ ,2])`, the vector with this `[1] 3 2 1` order will be called `ord`.
* now, you write `X[ord, ]` meaning you want to order the rows of the whole X matrix according to `ord`, so the first row will become the third, the second will stay and the first will become the last:
```
X
[,1] [,2] [,3]
[1,] 5 3 2
[2,] 1 6 4
[3,] 7 9 8
```
- [x] **In line with the previous question, in the exercise 3 appears 'Swap the positions of column 1 and 3 in the matrix X', and the solution is X[,c(3,2,1)], why?**
Because we want columns in this new 3,2,1 order. Thus, we use it in the `X[rows_I_want, columns_I_want]` way with `rows_I_want` empty, meaning we want all the rows and `columns_I_want` set to the `c(3,2,1)` vector which says the old 3rd column is the new 1st and the old 1st is the new 3rd.
### Feedback
* Everything was great today. Thanks!
* Very helpful lectures.
* Very good lab!
## Wednesday - Day 3
### Questions & Issues
- [x] **You said to use Linux terminal to debug large files. I am using Windows, how shall I go about it?**
You can use [Putty](https://www.chiark.greenend.org.uk/~sgtatham/putty/) which seems to be the best option with minimal dependencies. Putty works. It is basic and minimal. For a full featured terminal, we generally use MobaXterm. It is one of few full featured free option. In addition to SSH, it has an embedded SFTP client as well for uploading and downloading files.
- [x] **I need to read XLS files from a spreadsheet into R, how?**
Currently the best options for excel are readxl and writexl packages. Minimal dependencies. No `rJava` dependency either.
- [x] **Can i use `rev()` or `sort()` to swap columns in a matrix/data frame?**
Not really, only in a special case, you want to reverse the order or sort all columns, say, alphabetically.
- [x] **Is there any way to paste all vectors within a data frame without explicitly writing them as `paste(dfr$vector1, dfr$vector2, (…))`, say if you want to paste 20 or something...**
`apply( iris[ , ] , 1 , paste , collapse = "-" )`
This example concatenates all columns in the `iris` data frame separated by `-`.
- [x] **For identifying the longest word, one can also use the length function?**
What you need is probably the `nchar()` function. See this example;
```
x <- c("hello","how","are","you")
length(x)
sapply(x, nchar)
```
- [x] **Hello, I usually use `read_excel` function to import my data, but everytime I close R and open it again I have to import a random table from the command `import dataset` for R to "recognise" the function again, do you know maybe how to debug this?**
Looks to me like a package-related issue. Probably a good reason to e-mail package developers or consider using functions from different package to load your data.
Sounds wierd! I can recommend using `readxl::read_xlsx()` function.
- [x] **What does it mean *Encode all NA values as “missing”, at export.*?** I think its phrased wrong. It should be "the string to use for missing values in the data." It means what should your missing data look like in the exported table. So the default na="NA" means missing data will look like NA in the output table. For example na="-9" will encode the missing data as -9 in the output table. It is useful when you know that your exported table will go into another software with special requirements.
- [x] **The syntax of `gsub()` is quite clear. However, I don't quite understand how the pattern is recognized?**
`?regex` has quite good explanation. But, it is quite lengthy. Please check the pair of `[` and `]`, `[:alnum:]`, `[:alpha:]`, and `[:digit:]` first.
Also, look at the [common matching patterns](https://www.bookofnetwork.com/images/r-images/op57.png) and a more complex example – [dissected e-mail address matching pattern](https://www.computerhope.com/jargon/r/regular-expression.gif).
- [x] **Is it possible to access the column values by their names instead of numbers in a dataframe?**
`df[, c(“ColA”, “ColB”)]`
It works for row names, too.
- [x] **Is it possible to know why we are converting most of things to string, `str()` in R?**
It is to convert different data type into same data type, string. And `str()` has nothing to do with strings it is to see the **structure** of an R object. We convert by using `as.character()`.
- [x] **In the complex list, someone wants to access any position element(i.e. sub-list, integer), we can use i.e. list[position], but why list[[position]] giving same result and what does [[position]] mean?**
A really good question! Let's create a list:
```
l <- list(a = 123, b = c(3:7), d = "Hello world!")
```
Now, you can access data in the slot, e.g. `b` in 3 ways: `l$b`, `l[2]` and `l[[2]]` which are [almost equivalent](http://cran.r-project.org/doc/manuals/R-lang.html#Indexing). The main difference is that to access, e.g. `a` and `d` we can write `l[c(1,3)]`, but `l[[c(1,3)]]` will not work.
- [x] **I am not able to understand how to get the summary of exons in total and in each chromosome from the *Drosophila* data. Could you please explain?**
* Quick look at the *Drosophila* data imported into a data frame shows that `d.gtf$Chromosome` is a `factor`. For factors, we can ask what are their possible levels `levels(d.gtf$Chromosome)` – in this case levels are chromosome names.
> **BEWARE!** The older versions of R provide `read.table()` with the default behavior of reading characters as **factors** while the newer versions changed the default to reading characters as `character`! If you are getting **NULL** in your answer, you either use `stringsAsFactors=T` in `read.table` or `length(unique(d.gtf$Chromosome))`.
* To get the per exon summary, we need to use `aggregate` function. Read more about it `?aggregate`. So the code will be:
```
aggregate(d.gtf$Feature, by=list(d.gtf$Chromosome), summary)
```
where `summary` function will be applied to `d.gtf$Feature` (exon, intron, etc.) in a by chromosome manner.
- [x] **Design a S3 class that should hold information on human proteins. The data needed for each protein is:**
* The gene that encodes it
* The molecular weight of the protein
* The length of the protein sequence
* Information on who and when it was discovered
* Protein assay data
```
protein.list<-list(Gene='ser',
mw=214,
length=121,
dis_name='Paul',
dis_year=1998,
protein_assay=250)
class(protein.list)
class(protein.list)<-'S3.protein.list'
S3.protein.list<-function(x){
cat('Gene': x$Gene,'\n'
cat('mw': x$mw, '\n')
}
```
What is wrong with the code?
Seems like you are lacking a `)` for the first `cat`***
And it should also be: `cat('Gene:' x$Gene,'\n')` – move the colon!
This works for me (also improved the readability of the code :-):
```
protein.list <- list(Gene='ser',
mw=214,
length=121,
dis_name='Paul',
dis_year=1998,
protein_assay=250)
class(protein.list)
class(protein.list) <- 'S3_protein_list'
class(protein.list)
print.S3_protein_list <- function(x) {
cat('Gene:', x$Gene,'\n')
cat('mw:', x$mw, '\n')
}
print.S3_protein_list(protein.list)
print(protein.list)
```
- [x] **When running the following code:**
```
url <- 'http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data'
abalone <- read.table(url, header=FALSE, sep=',')
head(abalone)
```
**I do not see use of `'` used in the syntax. Why is it so?**
I am not sure I got the question right, but `'` is used as a quote. The `sep=` argument takes a character (or a vector of characters) as argument. You need to quote strings either by using `'` or `"`. You also need to pas url as string to the `read.table()` function. That's why you have quotes in the first line of your code.
- [x] **How to visualize the 'NULL' data from the dataset? In addition, is there a way to count the real or `null` events in sorted way?**
Which lab are you refering to?
There is not actually query in the lab session but I had curosity to know about "Extract all the NULL dataset from dataframe" and "How to count the frequency of occurence of certain element in the dataframe"
OK, I see! Well, `NULL` is a special value and represents a NULL object in R. Data frame with some data will not be NULL and all potential NULL values will be converted to zeros, see e.g.:
```
x <- sample(c(0,1,NULL), size=10, replace=T)
x
```
When you would like to count occurences of, say, 1 in a data frame, you can:
```
col1 <- sample(c(0,1,3), size = 10, replace=T)
col2 <- sample(c(0,1,3), size = 10, replace=T)
df <- data.frame(a = col1, b=col2)
sum(df == 1)
```
- [x] **I have downloaded the file `Drosophila_melanogaster.BDGP6.86.gtf.gz` from the link mentioned in the course. Further I stored on the desktop but i could not read the data. Why is it so ?
Note: I copy and paste the suggestion as well but it showing error `No such file/directory`**
It must be a path error. Contact one of the TAs to get help in a breakout room!
### Feedback
## Thursday - Day 4
### Questions & Issues
- [x] **When I load "tidyverse" in RStudio it gives me this: Is it fine?**
```
-- Conflicts ---------------------------------- tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
```
Yes, no problem, unless `filter` or `lag` in `stats` package is used. Those two functions in `stats` package are not frequently used. So, no worries in most cases.
The warning means.
"A new package is loaded which has functions with the same names as those in other packages (here `stats`). So, please be aware that those functions in the `stats` packages will be masked”.
> You can always use a specific package/method combo with `package::method`, i.e. `stats::filter()` to use the filter function from the `stats` package. In your case, just calling `filter()` will implicitly use `dplyr::filter()`
- [x] **What is the different between `%T>%` and `%>%` ?**
`%T>%` is for plotting or printing. More info is available in help, `?'%T>%'`.
For long lines linked with pipes, it is really useful to check intermediate status of a variable (or a column in tibble).
> The `%T>%` pipe should perhaps be called %Y>% pipe as it better illustrates what it does. If you do pipe:
`x %T>% f() %>% g()`, typically the `f(x)` function does not return anything, like the `print()` function. But your `g()` needs some data to work with so that `x` has to go to `f(x)` and also "jump" to `g(x)`. So, writing %T>% actually does two flows: `x %>% f()` and `x %>% g()` but in one line...
- [x] **In the code below:**
```
data <- data.frame(cbind(X=x, Y=y, R=r), row.names=names)
plot(data[,1:2])
```
**What `cbind()` is doing here? Without using `cbind()`, I got the same result...**
That is a good point! `cbind` is not really necessary here. Thanx for spotting this!
- [x] **Running `plot(data[,1:2], type='n')`. After using this command, I got error message *object of type 'closure' is not subsettable*. Even though I used `dev.off` in the previous graph to close.**
>What are you getting when you type `class(data)` and `dim(data)`?
`class(data)`: data frame and `dim`: `20 3`. Luckily I fixed the problem now. I just closed Rstudio and started again. It worked. Thanks.
> The *object of type 'closure' is not subsettable* typically means you are trying to subset something that is not subsettable, like a function. A good example given by Jenny Bryan from RStudio is that you write the following code:
```
dfa <- data.frame(x = 1:8, y = 8:1)
df$x
```
> What happened here is that you confused names and in the second line you thought your newly created data frame is called `df` when in fact it is called `dfa`. On the other hand, `df` is a pre-loaded function for getting F-distribution values (`stats::df()`). So, what you tried to do was to subset (i.e. retrieve some data) from a function as if it was a `data.frame`. Hence the error.
- [x] **I am trying to make a plot using some of my old data. When I want the x-axis to be from -2 to +2. But there seems to be some problem. Could you identify what it is?**
```
plot(data[,2:3],
type = 'n',
xaxt = 'n',
yaxt = 'n',
xlab = "log2(fold change)",
ylab = "-log10(p-value)", frame.plot=F)
# x and y axis
coord.x <- seq(-2,2, by=1)
axis(side = 1, at = coord.x)
```

You also need to specify `xlim = c(-2, 2)` parameter in your `plot`. Otherwise, R is taking the limits not from the `axes` but from the values provided as data. It is logical – `plot` is not aware of what you **will** put as values for `axis()` as it cannot read your mind or future code :-)
- [x] **In question "How many unique routes exists?":
What is the role of ".keep_all=T" in the following code? I've read the description in the "?distinct" but couldn't underestand?**
```
flights %>%
mutate(route=paste(origin,"-",dest)) %>%
distinct(route,.keep_all=T) %>%
nrow()
```
If you skip the `.keep_all=T` parameter, all columns but `route` will be lost. You want to keep them all.
- [ ] **Problem with Tidyverse slides download!**
What exactly happens?

### Feedback
* Great course with excellet structure and flow, easy to follow
* Really enjoyed working with Tidyverse. It makes life easy! Really good idea that you've included Tidyverse in this course. Nice labs that help to understand and learn more. As a suggestion, it would be nice to mention the name of needed packages for each lab in the beginning of the lab.
## Friday - Day 5
### Materials for today Topic of Choice. You chose `graphics`!
* [Glamour of Graphics](https://rstudio.com/resources/rstudioconf-2020/the-glamour-of-graphics/) lecture by Will Chase at `rstdio::rconf2020`.
* An example of creating a plot according to the concepts introduced by Will Chase. [Marcin's implementation](http://www.kierczak.net/website/blog/2020-02-05_gglamour/).
* An example of a coherent [design system](https://www.behance.net/gallery/79724387/DrWhy-Explainers-for-ML-models-data-visualization) by Hanna Piotrowska & Przemek Biecek for their work on R packages for xAI and xML (x=explainable).
* An excellent book Clas O. Wilke [Fundamentals of Data Visualization](https://clauswilke.com/dataviz).
* The excellent source of examples – [R Graph Gallery](https://www.r-graph-gallery.com).
### Questions & Issues
- [x] **How do I turn off numbers with scientific notation on plots?**
Set `options(scipen=10000)` before plotting.
- [x] **I am getting the following error: `Error: C stack usage is too close to the limit`. What is wrong?**
This is to do with the OS and less to do with R. In the terminal, you can use `ulimit -s` to see the stack limit value in bytes. Set it to a larger number or unlimited. `ulimit -s unlimited`. R probably needs to be restarted.
Can be caused due to a messy operation in R. Calling functions recursively was the issue in this case.
- [x] **Why the color is blue when we use `aes(color=Sepal.Width)`? Is that the default?**
Yes that is the default. You can use other colors if you prefer. Here is one example
```
library(viridis)
ggplot(data=iris,mapping=aes(x=Petal.Length,y=Petal.Width))+
geom_point(aes(color=Sepal.Length)) +
geom_smooth(method="lm") +
scale_color_viridis(option = "D")
```
- [x] **Is there a way of plotting flowcharts in R?**
you may want to check the following:
* solution proposed in this [blog entry](https://davetang.org/muse/2017/03/31/creating-flowchart-using-r/),
* another solution, proposed [here](https://scriptsandstatistics.wordpress.com/2017/12/22/how-to-draw-a-consort-flow-diagram-using-r-and-graphviz/)
* or utilize the [`ggnet`](https://briatte.github.io/ggnet/) package for this purpose.
- [x] **How to change alignment of plot title?**
Example of right justification
```
library(ggplot2)
ggplot(iris,aes(Petal.Width,Sepal.Width))+
geom_point()+
labs(title="My badass title")+
theme(plot.title=element_text(hjust=1))
```
### Feedback
* Super nice course! I really enjoyed it and found it very useful! Thanks!
* The course gave a great overview of the main principles of R, and that is very important for someone like me with minor knowledge in R and programming in general. It s also good to hear from experts that thing don't really work smoothly always even for them, as well as that google is an importnat part of the process :')', it makes R feel less intimidating. I loved the design of the course material, very neat and easy to follow. I liked that the course structure was provided in detail from the very beginning, and I also found the content of the labs very useful. Summarizing, great content, great structure and great teachers happy to help, even in these circumstances of online teaching. Thank you so much!
* Very good course. I have leanrt lot of concepts of programming and how to implement in the projects. Course material with lab excercise is perfect match to get good understanding about the subject. Thanks !
* Really nice course. Is a very well starting point for people that have very few or none previous experience in R, or programming at all. Since I'm planning to gain experience on these tools for my Postdoc project, mainly to have more independence and have a better communication with bioinformaticians, it was really helpful. Thanks for your time and dedication, and hope to come back to you at some point with further questions. You R great teachers, Thanks! (Gustavo)
* Good course, although very intense. I found the exercices of the first 3 days a bit hard, but I really enjoyed the exercices from Thursday and today as some hints were given in the guidelines. Thank you very much for your time.
* Really enjoyed taking this course! Very informative and well organized! Nice labs that helped learning a lot. It was a nice starting point form me in programming with R. Looking forward for more related courses. If you plan an advanced course with statistics and machine learning. That would be awesome!