owned this note
owned this note
Published
Linked with GitHub
---
tags: ResBaz2021
---
# SQL databases in R
## Thursday, May 20th, 2021 1\:00-3\:00
[Back to Resbaz HackMD Directory](https://hackmd.io/@ResBaz21/directory)
Descriptions Forthcoming
## Getting Started
Information here: https://github.com/pachamaltese/uarizona-workshops
---
## Introductions
Name, Affiliation, Title, Email, Social Media
- Your Name, University of Arizona, Your title, youremail@email.arizona.edu, your social media
- Julian Pistorius, University of Arizona - College of Agriculture and Life Sciences, Research Software Engineer, julianp@arizona.edu, [github@julianpistorius](https://github.com/julianpistorius/)
- Kelsey Gonzalez - UArizona, @KelseyGonzalez
- Rene Dario Herrera (they/them), University of Arizona Cancer Center, Data Analyst, renedherrera@email.arizona.edu, [twitter@reneherrera](https://twitter.com/reneherrera), [github@renedarioherrera](https://github.com/renedarioherrera), [linkedin.com@renedarioherrera/](https://www.linkedin.com/in/renedarioherrera/)
- Iris Rodden, University of Arizona, PhD student SNRE, irisr@email.arizona.edu
## Notes
> Julian Pistorius: I created a repository which you can launch in MyBinder which has Jupyter and RStudio alongside a built-in PostgreSQL server: https://github.com/julianpistorius/binder-postgres

Install and use `nycflight13` package:
```r=3.6
install.packages("nycflights13")
library(nycflights13)
flights
```
Query table using `dplyr`:
```r=3.6
library(dplyr)
flights %>% filter(month == 3)
```
Indexes: You need a unique way to identify observations
```r=3.6
# Show airlines
airlines
# Indexes
unique_index <- list(
airlines = list("carrier"),
planes = list("tailnum")
)
index <- list(
airports = list("faa"),
flights = list(
c("year", "month", "day"), "carrier", "tailnum", "origin", "dest"
),
weather = list(c("year", "month", "day"), "origin")
)
```
Install RPostgres package:
```r=3.6
install.packages("RPostgres")
library(RPostgres)
```
Create environment variables for SQL access:
```R3.6
usethis::edit_r_environ()
# and add to your environment to keep them restricted..
tutorial_db="postgres"
tutorial_user="postgres"
tutorial_pass="rezbaz2021"
tutorial_host="143.198.138.225"
```
For Binder environment:
```R3.6
usethis::edit_r_environ()
# and add to your environment to keep them restricted..
tutorial_db="postgres"
tutorial_user="postgres"
tutorial_pass=""
tutorial_host="127.0.0.1"
```
then session->restart R on toolbar
```R3.6
fun_connect <- function() {
dbConnect(
Postgres(),
dbname = Sys.getenv("tutorial_db"),
user = Sys.getenv("tutorial_user"),
password = Sys.getenv("tutorial_pass"),
host = Sys.getenv("tutorial_host")
)
}
```
And then explore tables, but this isn't the best option
```R3.6
# explore tables ----
schema <- "public"
conn <- fun_connect()
#this isn't the best
# remote_tables <- dbGetQuery(
# conn,
# sprintf("SELECT table_name
# FROM information_schema.tables
# WHERE table_schema='%s'", schema)
# )
#this is better
DBI::dbListTables(conn)
dbDisconnect(conn)
```
```r3.6
# what tables do my server find
remote_tables <- as.character(remote_tables$table_name)
remote_tables
# Which tables do I need to upload to my server
local_tables <- utils::data(package = "nycflights13")$results[, "Item"]
tables <- setdiff(local_tables, remote_tables)
```
Now to connect...
```r3.6
conn <- fun_connect()
obs_per_month <- tbl(conn, "flights") %>%
group_by(year, month) %>%
count() %>%
collect()
more_complicated <-
tbl(conn, "flights") %>%
group_by(origin) %>%
count() %>%
inner_join(tbl(conn, "airports") %>% select(faa,name), by = c("origin" = "faa")) %>%
collect()
dbDisconnect(conn)
```
If you want to see the SQL command,
```r3.6
conn <- fun_connect()
obs_per_month <- tbl(conn, "flights") %>%
group_by(year, month) %>%
count() %>%
show_query()
dbDisconnect(conn)
```
## Questions and Answers
In this section, you can post your questions and feel free to answer if you have it. Questions will be answered during or after the workshop.
1. Ask your question.
- Here is an answer
2. I get a problem:
>
```
> library(RPostgres)
Error: package or namespace load failed for ‘RPostgres’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/srv/conda/envs/notebook/lib/R/library/Rcpp/libs/Rcpp.so':
/usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /srv/conda/envs/notebook/lib/R/library/Rcpp/libs/Rcpp.so)
```
3.
## References
Possible solution to problem in Binder/Ubuntu: <https://stackoverflow.com/questions/44773296/libstdc-so-6-version-glibcxx-3-4-20-not-found>
---
:::info
**Session Feedback :mega:**
Use the link below to provide your feedback on the session:
[**Session Feedback Form**](https://forms.gle/TrnJpr9qRBEKdnVVA)
:::