# CS186 MT2 Notes
# Lecture 10: 3/1/20
## Iterators
* Setup, init, next, close
* Pull-based computation model: propagates down the graph
* Encapsulation - any iterator can be input to another
* State - iterators can maintain substantial internal states
* Like partitions for sorting/hashing
## Heap Scan
* Init:
* Open heap file
* get cur_page and cur_slot
* next:
* return [cur_page, cur_slot] recordid, move pointer ahead (moving page if necessary)
## Sort (2-pass)
* Init: Repeatedly call child.next to get $B-1$ to fill up buffers to sort
* Finish pass 1
* next:
* output = min tuple across all buffers
* if last tuple in buffer, fetch next page
## Group By, Sorted input
* Init: initializes children
* next: get all tuples with same group by value and aggregate them
## Join Operators
* $[R]$ = num of pages to store R
* $p_R$ = num of records per page of R
* $|R|$ = cardinality (num of records)
* $|R| = p_R *[R]$
* Simple Nested Loop Join (cross product)
* Cost: $[R] + |R|[S]$
* Changing join order can make IO cheaper
* Page Nested Loop Join
* Cost: $[R] + ([R] * [S])$
* Block/Chunk Nested Loop Join
* for each rchunk of B-2 pages of R: for each spage of S
* Cost: $[R] + \lceil [R]/(B-2) \rceil [S]$
* Index Nested Loops Join
* Cost: $[R] + |R|*$cost to find matching tuple S
* Unclustered: $[R] + |R|$(Search + HeapAccess)
* clustered: $[R] + |R|*$(Search) + distinct_vals(R) * HeapAccess
* Alt 1: cost to traverse tree root to leaf
* Alt 2-3: Cost to lookup RID and retrive record
* clustered: 1IO per page
* unclustered: 1 IO per matching tuple
* Sort-Merge Join
* Equality predicate only (Equi-joins, natural joins )
* 2 stages: sort tuples in R & S by join key
* Join pass: merge-scan sorted partitions, emit tuples that match
* Keep a mark on first euqality of s
* Cost: Sort R + Sort S +$([R] + [S])$
* Sort: $2[R](1 +\log{_{B-1}[R]/B})$
* Worst case is a cross product: $[R]*[S]$
* Assume $B > \sqrt{\max{[R],[S]}}$
* Refinment:
* Read R, write out sorted runs in pass0
* Read S, write out sorted runs in pass0
* Merge R-runs and S-runs, while finding $R \bowtie S$ matches
* Avoid writing out final pass of R,S in sorting
* Cost: $3[R] + 3[S]$
* Requires $B \geq \sqrt{R} + \sqrt{S}$
## Grace Hash Join
* Naive hash Join: Put all of R in hash table
* Cost: $[R]+[S]$
* Memory: $R < B$
* Decompose into smaller partial join
* $R \bowtie_{\text{sid}} S = \cup (\sigma_{\text{hash(sid)}}(R) \bowtie_{\text{sid}} \sigma_{\text{hash(sid)}}(S))$
* Requires equality predicate (equi-joins, natural joins)
* Stages
* Partition tuples from R,S by join key
* Build and probe sep hash table for each partition
* Probing relation
* Total cost: $3([R]+[S])$
* Partition: $2([R]+[S])$
* Matching: $[R]+[S]$
* Memory: $R <(B-1)(B-2) \approx B^2$
* No recursive partititioning of S
## Pro-cons
* Sorting pros:
* Good if input already sorted
* Not sensitive to data skew
* Hasing pros:
* For join: # passes depends on size of smaller relation
* Good if input already hashed or need ouput hashed
# Lecture 11: 3/5/20
## Query Optimization
* Query parser: correctness, authorization check
* Query rewriter
* Cost based query optimizer
* 3 Parts:
* Plan space
* Cost estimation
* Search strategy
## Relational Algebra
* Selections:
* Cascade $\sigma_{c1 \land \ldots \land cn} \equiv \sigma_{c1}(\ldots(\sigma_{cn}(R))\ldots)$
* Commute: $\sigma_{c1}(\sigma_{c2}(R)) \equiv \sigma_{c2}(\sigma_{c1}(R))$
* Projections:
* Cascade: $\pi_{a1}(R) \equiv \pi_{a1}(\ldots(\pi_{a1 \ldots an-1}(R))\ldots)$
* Cartesian Product:
* Associative: $R \times (S \times T) \equiv (R \times S) \times T$
* Commutative: $R \times S \equiv S \times T$
* Joins are not associative and commutative
* Some join orders have cartesian product
## Standard Heuristics
* Selection cascade and pushdown:
* Select as soon as you have relevant columns
* Projection cascade and pushdown
* Keep only columns you need downstream
* Avoid cartesian products
* Theta joins are better
* Not always optimal(if 2 small tables)
## Physical equivalences
* Base table access: heap scan, indes scan
* Equijoins:
* Block Nested loop: simple, exploits extra memory
* Index Nested Loop: Good if 1 rel small, other indexed properly
* Sort-merge join: good with small memory, equal-sized tables
* Grace hash join: Even better than sort w 1 small table
* Non-equijoins:
* Block Nested Loop
## Optimization Remarks
* Selection pushdown:
* Pushing selection to inner loop of nested loop join does not save IOs
* Join order:
* Materializing inner loops:
* Write temp table as intermediate
* PNLJ: larger relation on right side (inner loop)
* Repeated scans always have same cost, difference is left side scan
* Indexes
* Reduce the number of initial read IOs
* Inner loop operations are faster (finding matching tuples)
# Lecture 12: 3/9/20
* Selinger optimization
* Plan space is too big, must be pruned
* Ignore overpriced subtrees
* Heuristics: left-deep plans, avoid cartesian products
* Consider combination of CPU and IO costs
* Cost estimation
* Query blocks
* Try to flatten query blocks (query re-writes)
* Can decorrelate correlated nested blocks
* Within a block:
* Consider relevant access methods for each relation
* Left-deep join tree order and join methods
* Physical properties of output
* Sort order (index scan, sort)
* Hash grouping (hash)
* Certain operators require properties at input
* Ex. MergeJoin requires sorted input
* Certain operators preserve these properties from inputs
* Ex. NestedLoopJoin preserves left outer input order
* Downstream operators may require properties
## Cost Estimation
* Estimate total cost of every operation
* Estimate size of result for each operation
* Assume independence for predicates in selections and joins
* Selinger cost: num IOs + CPU-factor
* Catalogs contain stats like num tuples, num pages, min/max, num keys, index height, num disk pages in index
* Maximum output cardinality = product of input cardinalities
* Selectivity/Reduction Factor of each term
* Large selectivity = large output
## Estimation
* If missing needed stats, assume 1/10
* col=value: S = 1/NKeys(I)
* col1=col2: S = 1/MAX(NKeys(I1, NKeys(2)))
## Histograms
* Equiwidth histograms have value domain evenly split and graph shows num values per bin
* Uniformity assumption within each bin
* Equidepth histograms have each bin equally deep, graph shows range of bin ends
## Search Algorithm
* Single-table plans: base case
* Consider access paths(file scan/index)
* Multi-table plans: induction
* Dynamic programming - same subtrees
* Optimiliaty with Bellman
* Best overall plan is composed of best decisions on subplans
* Best plans of height 1 to n
* Keys = subset of tables in from clause
* Interesting orders
* Sorted by something used downstream
* ORDER BY, GROUP BY, join attributes downstream
* Avoid cartesian products
* Only consider if join condition or all WHERE predicates already used up
# Lecture 13: 3/13/20
## Parallelism
* Speed up vs scale up
* Pipeline - scales up to pipeline depth
* Partition - scales up to amt of data
* Parallel SQL databases worked b/c data indep
* 3 Architectures:
* Shared Memory - shared RAM and disks (ex. single computer)
* Shared Disk - Separate RAM but same disk
* Shared Nothing - Separate CPU, RAM, disk connected over same network
* Inter-query parallelism
* Parallel across queries
* Each query runs on sep processor (single thread per query)
* Requires parallel-aware concurrency control
* Intra-query operator
* Within a single query
* Inter-operator: between operators
* Ex. Pipeline parallelism: runs operations for multiple records at a time
* Bushy tree parallelism
* Intra-operator: within a single operator
* Ex. Partition Parallelism
## Parallel Data Access
* Partition data across disks/machines
* Range: good for equijoins, range, group by
* Hash: Good for equijoins, group by
* Round robin: good for spreading load uniformly
* Shared nothing benefits from good partitioning
* Parallel scans: $\sigma_p$ skip entire sites that have no tuples satisfying p
* Indexes built at each partition
* For unique key insertion - broadcast lookup and collect responses if any one has the key
## Hashing
* Partition data to separate machines using $h_n$
* Near perfect speed and scale up
* Limitation: But cannot start phase2 until phase1 done
* Naive Parallel hash join:
* Phase 1: Shuffle each table across machines $h_n$
* Receivers proceed to naive hash independently
* Parallel Grace Hash Join
* Pass 1: Parallel streaming: shuffle tuples across machines
* Do this twice for each relation being joined
* Pass 2: Local GHJ Everything is local so everything is independent
* Near perfect speed-up and scale-up
* Limitation: Pass 1 must end
## Sorting
* Parallel sort
* Pass 0 : shuffle data across machines by value range
* Passes 1-n done independently as single node sorting
* Data skew:
* Choose ranges to get same num of tuples in each partition
* Create histogram over random sample of data to divide
* Parallel Sort Merge Join
* Pass 0 ... n-1: parallel sorting for both R,S (runs for R and S)
* Pass n: merge join partitions locally on each node
## Parallel Aggregates
* Hierarchical aggregation
* Global/local decomposition
* Sum(S): Sum the local sums
* Count(S): Sum local counts
* Avg(S): Sum(S)/Count(S)
* Group by
* Local aggregation in hash table
* Naive Hash Group By
* Local aggregation: hash table keyed by group key $k_i$ to keep local agg
* Shuffle local aggs by hash function $h_p(k_i)$
* Compute global aggs for each key $k_i$
* Need enough memory to gold $(k_i, agg_i)$ group values pairs
## Pipeline breakers
* Sort merge join requires sort to be complete
* GHJ can't start probing until hashtable is built
* Symmetric Pipeline Hash Join
* Each node allocates 2 hash tables, one for each side
* Given R, build R hashtable by join key and probe S
* Symmetric for given S
* Fully streamed algorithm
## Alternate Schemes
* One sided shuffle: if R already partitioned, just partition S
* Local join at each node and union results
* Broadcast join: If R is small, send to all nodes that have a partition of S
* Local join at each node and union results
# Lecture 14: 3/17/20
## Concurrency Control
* Part 1: Concurrency control
* Correct/fast data access
* Part 2: Recovery
* Fault tolerant, not corrupted by crash
* Concurrency advantages
* Higher throughput (transactions/sec)
* Lower latency (response time per transaction)
* Independent of other unrelated transactions
* Can write higher level queries (SQL) and guarantee data is processed reliably
* Order matters:
* Inconsistent reads: User reads in middle of another user's transaction (not a state intended by either user)
* Lost update (using preimage): 2 users update record at same time
* Dirty reads: A user updates record while another user reads the record
## Transactions
* Transaction (xact): A sequenec of multiple actions to be executed as an atomic unit
* A sequence of reads and writes of database objects
* Application view:
* Begin transaction
* Sequence of SQL statements
* End transaction
* Xact manager controls execution of transactions
* Program logic is invisible to DBMS
* SBMS can only see data reads/writes (not SQL queries)
* ACID:
* Atomicity: All actions in the xact happen, or none
* Ends as a commit: contract w/ caller of the DB
* Abort: system crash after executing some actions
* Consistency: If DB starts out consistent, ends up consistent at end of xact
* Consistency: set of declarative integrity constraints
* Like primary keys or foreign keys
* Transactions that violate integrity are aborted
* Isolation: Execution of each xact is isolated from that of others
* DBMS interleaves actions of many xacts
* DBMS ensures 2 xacts do not interfere
* Each xact executes as if ran by itself
* Durability: If xact commits, its effects persist
* Effects of a committed xact must survive failures
* DBMS logs all actions to undo actions of aborted transactions
* Redo actions of committed transactions not yet propagated when system crashes
## Serializability
* Transaction schedule: sequence of actions on data from 1 or more transactions
* Serial schedule: each transaction runs from start to finish without any intervening actions from other transactions
* 2 schedules equivalent if:
* Involve same transactions
* each individual transaction actions are ordered the same
* Both schedules leave the DB in the same final state
* Schedule S is serializable if equivalent to a serial schedule
## Conflict Serializability
* Tricky to check if leaves DB in same final state
* Has some false negatives (sacrifice some concurrency for easier check)
* 2 operations conflict if they:
* Are by different transactions
* On same object
* At least 1 of them is a write
* Order of non conflicting operations has no effect on final state
* Conflict equivalent: 2 schedules that involve the same actions and each pair of conflicting actions is ordered the same way
* Conflict serializable: S is conflict equivalent to some serial schedule (S is also serializable)
* Can swap consecutive non-conflicting operations of diff transactions
* Some serializable schedules are not conflict serializable
* Conflict Dependency Graph
* One node per xact
* Edge from $T_i, T_j$ if operation conflicts from one to other (i has earlier in schedule than j)
* $(T_1, T_2)$ means $T_2$ depends on $T_1$
* Schedule is conflict serializable iff graph is acyclic
## View Serializability
* Fewer false negatives
* View equivalent schedules:
* Same initial reads
* Same dependent reads
* Same winning writes
* Blind writes can ignore intermediate writes
# Lecture 15: 3/19/20
## 2PL
* 2 Phase locking: Most common scheme, a bit pessimistic
* Sets locks for fear of conflict
* Acquisition and Release
* Rule 1: xact must obtain a shared (S) lock before reading, exclusive (X) lock before writing
* 2 objects cannot get a lock on same object except 2 shared
* Rule 2: xact cannot get new locks after releasing any lock
* Problem: Does not prevent cascading aborts
* Lock point: the most number of locks it has
* All transactions executed with 2PL are conflict serializable
* Strict 2PL
* Cascading aborts: rollback of T1 requires rollback of T2
* Strict 2PL requires all locks to be released together when transaction completes (commits or aborts)
## Lock Management
* Lock protocol: if your transaction holds a lock and my transaction requests a conflicting lock, I am queued up waiting for that lock
* Lock manager maintains hashtable keyed on names of objects being locked
* Granted set: set of xacts currently granted acces to the lock
* Lock mode: shared or exclusive
* Wait queue: quque of lock requests
* If no xact in granted set or wait queue with a conflicting lock, put requestor in granted set
* Lock upgrade: xact w shared lock can request upgrade to exclusive
### Deadlock
* Deadlock: cycle of xacts waiting for each other
* Deadlock detection: Waits-for graph, check for cycles
* Edge (T1, T2) means T1 waits for T2
* Shoot a transaction on the cycle
* Optimistic
* Prevention, avoidance, detection and resolution
* Most just use timeouts
* Prioritize lock upgrades (multiple lock upgrades are unavoidable)
* Prevention (OS): resource ordering, but can't impose order on DBMs files
* Deadlock Avoidance: Assign priorities based on age
* If Ti wants Tj's lock and Ti > Tj
* Wait-die: If Ti has higher priority, Ti wait for Tj, else Ti aborts
* Wound-wait: If Ti has higher priority, Tj aborts else Ti waits
* If a xaction restarts, give it the original timestamp so it will get old
* Pessimistic: aborts transactions that may not cause deadlock
* Works b/c same priority scheme applies on all lock conflicts, DBMs can roll back transactions, higher priority transactions eventually get access
## Lock Granularity
* Lock tuples vs pages vs tables
* Fine grained means better concurrency but a lot of locks
* Smaller number of locks is easier to keep track of
* Multiple locking granularity: transaction explicitly locks a node and implicitly locks all descendents
* Solution: New lock modes
* Allow xacts to lock at each level with intent locks
* Before getting a S or X lock, xact must have proper intent locks on ancestors
* IS: Intent to get S lock at finer granularity
* IX: Intent to get X lock at finer granularity
* SIX: S and IX at the same time
* Intention locks allow higher levle node to be locked in S or X mode without having to check all descendents
* To get S or IS lock on node, parent node must have Is or IX
* To get X or IX on SIX on a node, must have IX or SIX on parent
* If locks are compatible, the intent locks must be compatible
# Lecture 16: 3/31/20
## Database Design
* Requirements Analysis
* Conceptual Design (Object-Relational Mappings)
* Logical Design (ER --> DBMS)
* Schema Refinement (consistency, normalization)
* Physical Design (indexes, disk layout)
* Security Design
## Data Models
* Data model - collection of concepts for describing data
* Schema: description of a particular collection of data
* Relational model:
* Every relation (table, row, col) has a schema that describes the columns (names, domains)
* Levels of abstraction:
* Views - how users see data
* Conceptual schema - logical structure
* Physical schema- describes files and indexes used
### Data Independence
* Logical data independence: Maintain views when logical structure changes
* Between views and conceptual schema
* Physical data independence: Maintain logical structure when physical structure changes
* Between conceptual and physical schema
## ER Model
* Entity: real world object described by a set of attribute values
* Entity set: collection of similar entities
* Relationship: Association among 2+ entities
* Can have their own attributes
* Relationship set: collection of similar relationships
* Key constraint: 1-to-many relationship (arrow)
* Participation constraint: requires at least 1 (bold edge)
* Weak entity: Can be identified uniquely only by considering the primary key of owner entity
* Owner entity set and weak entity set must participate in 1-to-many relationship set
* Weak entity set must have total participation in the identifying relationship set
* Ternary - one relation (diamond) connected to 3 entities (rectangles)
* Aggregation - dotted box around a set
* Allows relationships to have relationships
* ER Models have lots of semantics
* Some constraints cannot be captured in ER
## Crow's Foot Notation
* $0$: 0 or more
* $|$: 1 or more
* $|0$: 1 or 0
* $||$: exactly 1
* $<-$: many

## Logical Design
* ER to Relational
* Entity sets --> tables
* Relationship sets --> table
* Many-to-many
* Keys for each participating entity set as foreign keys
* Set of atributes form a superkey
* All descriptive attributes
* Weak entity primary keys are concatenation
* On delete cascade - when owner entity is deleted, all owned weak entities must also be deleted
# Lecture 17: 4/2/20
* Functional dependencies to avoid redundancy in schemas
* Decomposition: split cols of 1 table into 2 tables
* $X \rightarrow Y$ (X determines Y)
* FD is a constraint enforced by DB
* Reject updates that break FD
* Not learned from data
* Primary keys are special cases of FDs
* $K \rightarrow ${all attributes}
* Superkey: set of columns that determins all columns in its table (aka a key)
* Candidate key: minimal set of columns that determines all columns in table
## Anomalies
* Update anomaly - an update makes FD inconsistent
* Insertion anomaly - insert a row, but don't know a column so we must invent a value without knowing truth
* Deletion anomaly - deleting all rows with a certain value, lose information for lookup table
* Chop relation to make a lookup table for X ($X \rightarrow Y$)
## Armstrong's Axioms
* $F+$: closure of F
* Set of all FDsimplied by F and trivial dependencies
* Armstrong's Axioms (complete inference rules)
* Reflexivity: If $X \supseteq Y, X \rightarrow Y$
* Augmentation: If $X \rightarrow Y, XZ \rightarrow YZ, \forall Z$
* Transitivity: If $X \rightarrow Y, Y \rightarrow Z \implies X \rightarrow Z$
* Additional rules
* Union: If $X \implies Y, X \implies Z, \implies X \rightarrow YZ$
* Decomposition: If $X \rightarrow YZ, \implies X \rightarrow Y, X \rightarrow Z$
* Cannot drop things from both sides
## Attribute Closure
* Set of FDs F is hard (exponential in num of attributes)
* Typically just check if $X \rightarrow Y$ is in F+
* Procedure
* Compute attribute closure of X (X+) wrt F
* X+ is set of all attributes A such that $X \rightarrow A$ is in F+
* Set X+ to be the new X
* Repeat until no change
* Check if Y is in X+
## Normal Forms
* Normal form:
* If we know certain problems are avoided/minimized
* Helps decide whether decomposing relation is useful
* 1st Normal form
* Attributes are atomic
* Boyce-Codd Normal Form (BCNF)
* R with FDs F is in BCNF if $\forall (X \rightarrow A)$ in F+
* $A \subseteq X$ trivial FD aka X is superkey for R
* R is in BCNF if the only non-trivial FDs over R are key constraints
* Every field is useful info that cannot be inferred via FDs
## Lossless Decompositions
* Decomposition - replacing a relation with multiple normalized relations
* New relation schemes have a subset of attributes of R
* Every attribute of R appears as attribute of 1+ new relations
* Problems
* 1) Lossy decomposition: impossible to reconstruct original relation
* Generate new bad data: $\pi_x(r) \bowtie \pi_y(r) \subseteq r$
* Lossless: $\pi_x(r) \bowtie \pi_y(r) = r$
* Decomp for R into X,Y is lossless wrt F iff $X \cap Y \rightarrow X$ or $X \cap Y \rightarrow Y$
* Basically, intersection of 2 tables must be a key of 1 of the 2 tables
* If $X \rightarrow Z, X \cap Z$ is empty, then R to R-Z, XZ is lossless (bc X is a superkey of XZ)
* 2) Dependency checking may require joins
* 3) Some queries become more expensive
## Dependency Preservation
* Projection of $F_X$ is set of Fds $U \rightarrow V$ in F+ (closure of F)
* Dependency preserving: $(F_X \cup F_Y)^+ = F^+$
* Decomposition into BCNF
* If $X \rightarrow Y$ violates BCNF, decompose into R-Y, XY (lossless)
* Repeatedly doing this gives collection of relations that are BCNF
* Order of handling can lead to different outcomes
* May not always be dependency preserving decomposition into BCNF