owned this note
owned this note
Published
Linked with GitHub
---
tags: datalab,sqlite
---
# SQLite Hands On - Thursday, Jan 19, 2023
CTB
[toc]
## Running SQLite
### Installing SQLite3 locally
For Windows users who want to try things out locally:
* go to https://www.sqlite.org/download.html
* find "Precompiled Binaries for Windows"
* download and install [sqlite-tools-win32-x86-3400100.zip](https://www.sqlite.org/2022/sqlite-tools-win32-x86-3400100.zip)
For Mac OS X and Linux computers, sqlite3 _should_ come installed with Python. On my Mac, `sqlite3` works to run it.
### Using a binder
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ngs-docs/2023-ggg298-sourmash-compare/stable?urlpath=rstudio)
This will create an RStudio Server running in the cloud.
To get to a command line, go to Terminal...
To run sqlite3, run `sqlite3 database.sql` at the terminal!
## Introducing SQLite3
## A Hands-On Introduction to SQL with SQLite!
SQLite is the amazing little embedded database that could! It comes with Python, it’s on every Apple device ever made, and it just … plain … works. Really, really well.
You might be interested in this session (generally) and SQLite (specifically) if any or all of the following apply -
* you have too much spreadsheet/dataframe-like data to fit easily in memory;
* you’ve heard about SQL but never really used it;
* you want to access the same data from multiple programming languages (like Python and R);
* you are naturally curious;
In this session I’ll develop a simple database for storing information about a personal book collection, and we’ll explore it together. I’ll show -
* how to design a simple SQL schema
* how to add data into the database with Python, and how to retrieve it
* how to explore database contents from the command line
## Some initial hands-on
SQLite is an open source database that you can use to store data on disk and interact with/retrieve that data in simple and also complex ways.
* it's very fast in many circumstances!
* it stores structured data!
* it is very "safe" in that it is a very robust way to store data
* I have not found it to be super disk space efficient but that can be tuned
### Creating a database with a simple schema
Here's a simple schema:
```sql=
PRAGMA foreign_keys = ON;
CREATE TABLE owners (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE books (
owner_id INTEGER,
author TEXT NOT NULL,
title TEXT NOT NULL,
isbn TEXT NOT NULL,
FOREIGN KEY (owner_id) REFERENCES owners (id)
);
```
* every database consists of one or more tables (as well as views and other things);
* each table is like a dataframe
* tables can be linked - see `owner_id` above!
* pragmas configure runtime information - here we are telling sqlite3 to obey foreign key constraints (more on that in a bit)
To start with this, run `sqlite3 library.db` at the command line. You should end up with a `sqlite>` prompt.
Then copy/paste the above schema definitions.
### Adding information - when all goes well
At the sqlite prompt, run
```sql=
insert into owners (name) values ('titus');
insert into owners (name) values ('hannah');
select * from owners;
```
* owner_id is automatically generated
* this is a fast primary key to be used for various retrieval operations; guaranteed to be unique!
And now add:
```sql=
INSERT INTO books
(owner_id, author, title, isbn)
VALUES (1, 'F. Scott Fitzgerald', 'Great Gatsby', '0-1-2-3');
INSERT INTO books
(owner_id, author, title, isbn)
VALUES (1, 'Charles Dickens', 'A Tales of Two Cities', '0-1-3-5');
INSERT INTO books
(owner_id, author, title, isbn)
VALUES (2, 'Ernest Hemingway',
'The Sun Also Rises', '5-4-6-7');
SELECT * FROM books;
```
et voila!
You can change the output format for selects with `.mode`, e.g. `.mode markdown`.
You can get my books with:
```sql=
SELECT * from owners,books WHERE
owners.name='titus' AND books.owner_id=owners.id;
```
This is called an "inner join" because you are showing tables only where the rows match. You can also do an outer join, or a cross-product:
```sql=
SELECT * FROM owners, books;
```
You can also do summary operations across groups -
```sql=
select owners.name,COUNT(books.title)
FROM owners,books WHERE owners.id=books.owner_id
GROUP BY owners.id;
```
and also do sorting -
```sql=
SELECT name FROM owners ORDER BY name ASC;
```
and lots of other things. SQL is its whole own world!
### When the schema stops you
Try running:
```sql=
DELETE FROM owners WHERE name='titus'
```
it should fail with
>Runtime error: FOREIGN KEY constraint failed (19)
Same with:
```sql=
UPDATE books SET owner_id=8;
```
Basically, SQLite can provide what's called "referential integrity" to databases, preventing you from "breaking" the schema structure.
## A few more advanced things
### Looking at a database
You can retrieve the schema from a database with `.schema`. This is pretty handy :laughing:
### Talking to a database
SQLite3 is embedded. No background server is needed!
That means that you can run the following Python code and there's no need to have anything else running:
```python=
import sqlite3, pprint
# get a database connection
db = sqlite3.connect('library.db')
# get a cursor (a way to interact with database - can have multiple of these)
c = db.cursor()
# run a SQL query
c.execute('SELECT * from books,owners WHERE books.owner_id=owners.id')
# retrieve results
for row in c:
pprint.pprint(row)
```
You can have multiple independent cursors, so you can run different queries with different cursors - if you want to run nested queries, you'll need two.
sqlite3 does support transactional isolation, so you can have one connection/set of cursors with a different view of the database than another; think simultaneous bank transactions...
SQLite3 does have support for R, but I don't know R much if at all :)
## Fun facts and thoughts about SQLite
You can manage millions of rows, hundreds of GBs of data, etc. with SQLite.
SQLite is a pretty good way to store data in an efficient, query-able way that is cross-language.
SQLite can be used to store "blobs" of data (images, for example) but it somewhat defeats the purpose of it to have "opaque" objects that can't be compared/sorted/matched/displayed.