# CS186 MT2 Notes # Lecture 10: 3/1/20 ## Iterators * Setup, init, next, close * Pull-based computation model: propagates down the graph * Encapsulation - any iterator can be input to another * State - iterators can maintain substantial internal states * Like partitions for sorting/hashing ## Heap Scan * Init: * Open heap file * get cur_page and cur_slot * next: * return [cur_page, cur_slot] recordid, move pointer ahead (moving page if necessary) ## Sort (2-pass) * Init: Repeatedly call child.next to get $B-1$ to fill up buffers to sort * Finish pass 1 * next: * output = min tuple across all buffers * if last tuple in buffer, fetch next page ## Group By, Sorted input * Init: initializes children * next: get all tuples with same group by value and aggregate them ## Join Operators * $[R]$ = num of pages to store R * $p_R$ = num of records per page of R * $|R|$ = cardinality (num of records) * $|R| = p_R *[R]$ * Simple Nested Loop Join (cross product) * Cost: $[R] + |R|[S]$ * Changing join order can make IO cheaper * Page Nested Loop Join * Cost: $[R] + ([R] * [S])$ * Block/Chunk Nested Loop Join * for each rchunk of B-2 pages of R: for each spage of S * Cost: $[R] + \lceil [R]/(B-2) \rceil [S]$ * Index Nested Loops Join * Cost: $[R] + |R|*$cost to find matching tuple S * Unclustered: $[R] + |R|$(Search + HeapAccess) * clustered: $[R] + |R|*$(Search) + distinct_vals(R) * HeapAccess * Alt 1: cost to traverse tree root to leaf * Alt 2-3: Cost to lookup RID and retrive record * clustered: 1IO per page * unclustered: 1 IO per matching tuple * Sort-Merge Join * Equality predicate only (Equi-joins, natural joins ) * 2 stages: sort tuples in R & S by join key * Join pass: merge-scan sorted partitions, emit tuples that match * Keep a mark on first euqality of s * Cost: Sort R + Sort S +$([R] + [S])$ * Sort: $2[R](1 +\log{_{B-1}[R]/B})$ * Worst case is a cross product: $[R]*[S]$ * Assume $B > \sqrt{\max{[R],[S]}}$ * Refinment: * Read R, write out sorted runs in pass0 * Read S, write out sorted runs in pass0 * Merge R-runs and S-runs, while finding $R \bowtie S$ matches * Avoid writing out final pass of R,S in sorting * Cost: $3[R] + 3[S]$ * Requires $B \geq \sqrt{R} + \sqrt{S}$ ## Grace Hash Join * Naive hash Join: Put all of R in hash table * Cost: $[R]+[S]$ * Memory: $R < B$ * Decompose into smaller partial join * $R \bowtie_{\text{sid}} S = \cup (\sigma_{\text{hash(sid)}}(R) \bowtie_{\text{sid}} \sigma_{\text{hash(sid)}}(S))$ * Requires equality predicate (equi-joins, natural joins) * Stages * Partition tuples from R,S by join key * Build and probe sep hash table for each partition * Probing relation * Total cost: $3([R]+[S])$ * Partition: $2([R]+[S])$ * Matching: $[R]+[S]$ * Memory: $R <(B-1)(B-2) \approx B^2$ * No recursive partititioning of S ## Pro-cons * Sorting pros: * Good if input already sorted * Not sensitive to data skew * Hasing pros: * For join: # passes depends on size of smaller relation * Good if input already hashed or need ouput hashed # Lecture 11: 3/5/20 ## Query Optimization * Query parser: correctness, authorization check * Query rewriter * Cost based query optimizer * 3 Parts: * Plan space * Cost estimation * Search strategy ## Relational Algebra * Selections: * Cascade $\sigma_{c1 \land \ldots \land cn} \equiv \sigma_{c1}(\ldots(\sigma_{cn}(R))\ldots)$ * Commute: $\sigma_{c1}(\sigma_{c2}(R)) \equiv \sigma_{c2}(\sigma_{c1}(R))$ * Projections: * Cascade: $\pi_{a1}(R) \equiv \pi_{a1}(\ldots(\pi_{a1 \ldots an-1}(R))\ldots)$ * Cartesian Product: * Associative: $R \times (S \times T) \equiv (R \times S) \times T$ * Commutative: $R \times S \equiv S \times T$ * Joins are not associative and commutative * Some join orders have cartesian product ## Standard Heuristics * Selection cascade and pushdown: * Select as soon as you have relevant columns * Projection cascade and pushdown * Keep only columns you need downstream * Avoid cartesian products * Theta joins are better * Not always optimal(if 2 small tables) ## Physical equivalences * Base table access: heap scan, indes scan * Equijoins: * Block Nested loop: simple, exploits extra memory * Index Nested Loop: Good if 1 rel small, other indexed properly * Sort-merge join: good with small memory, equal-sized tables * Grace hash join: Even better than sort w 1 small table * Non-equijoins: * Block Nested Loop ## Optimization Remarks * Selection pushdown: * Pushing selection to inner loop of nested loop join does not save IOs * Join order: * Materializing inner loops: * Write temp table as intermediate * PNLJ: larger relation on right side (inner loop) * Repeated scans always have same cost, difference is left side scan * Indexes * Reduce the number of initial read IOs * Inner loop operations are faster (finding matching tuples) # Lecture 12: 3/9/20 * Selinger optimization * Plan space is too big, must be pruned * Ignore overpriced subtrees * Heuristics: left-deep plans, avoid cartesian products * Consider combination of CPU and IO costs * Cost estimation * Query blocks * Try to flatten query blocks (query re-writes) * Can decorrelate correlated nested blocks * Within a block: * Consider relevant access methods for each relation * Left-deep join tree order and join methods * Physical properties of output * Sort order (index scan, sort) * Hash grouping (hash) * Certain operators require properties at input * Ex. MergeJoin requires sorted input * Certain operators preserve these properties from inputs * Ex. NestedLoopJoin preserves left outer input order * Downstream operators may require properties ## Cost Estimation * Estimate total cost of every operation * Estimate size of result for each operation * Assume independence for predicates in selections and joins * Selinger cost: num IOs + CPU-factor * Catalogs contain stats like num tuples, num pages, min/max, num keys, index height, num disk pages in index * Maximum output cardinality = product of input cardinalities * Selectivity/Reduction Factor of each term * Large selectivity = large output ## Estimation * If missing needed stats, assume 1/10 * col=value: S = 1/NKeys(I) * col1=col2: S = 1/MAX(NKeys(I1, NKeys(2))) ## Histograms * Equiwidth histograms have value domain evenly split and graph shows num values per bin * Uniformity assumption within each bin * Equidepth histograms have each bin equally deep, graph shows range of bin ends ## Search Algorithm * Single-table plans: base case * Consider access paths(file scan/index) * Multi-table plans: induction * Dynamic programming - same subtrees * Optimiliaty with Bellman * Best overall plan is composed of best decisions on subplans * Best plans of height 1 to n * Keys = subset of tables in from clause * Interesting orders * Sorted by something used downstream * ORDER BY, GROUP BY, join attributes downstream * Avoid cartesian products * Only consider if join condition or all WHERE predicates already used up # Lecture 13: 3/13/20 ## Parallelism * Speed up vs scale up * Pipeline - scales up to pipeline depth * Partition - scales up to amt of data * Parallel SQL databases worked b/c data indep * 3 Architectures: * Shared Memory - shared RAM and disks (ex. single computer) * Shared Disk - Separate RAM but same disk * Shared Nothing - Separate CPU, RAM, disk connected over same network * Inter-query parallelism * Parallel across queries * Each query runs on sep processor (single thread per query) * Requires parallel-aware concurrency control * Intra-query operator * Within a single query * Inter-operator: between operators * Ex. Pipeline parallelism: runs operations for multiple records at a time * Bushy tree parallelism * Intra-operator: within a single operator * Ex. Partition Parallelism ## Parallel Data Access * Partition data across disks/machines * Range: good for equijoins, range, group by * Hash: Good for equijoins, group by * Round robin: good for spreading load uniformly * Shared nothing benefits from good partitioning * Parallel scans: $\sigma_p$ skip entire sites that have no tuples satisfying p * Indexes built at each partition * For unique key insertion - broadcast lookup and collect responses if any one has the key ## Hashing * Partition data to separate machines using $h_n$ * Near perfect speed and scale up * Limitation: But cannot start phase2 until phase1 done * Naive Parallel hash join: * Phase 1: Shuffle each table across machines $h_n$ * Receivers proceed to naive hash independently * Parallel Grace Hash Join * Pass 1: Parallel streaming: shuffle tuples across machines * Do this twice for each relation being joined * Pass 2: Local GHJ Everything is local so everything is independent * Near perfect speed-up and scale-up * Limitation: Pass 1 must end ## Sorting * Parallel sort * Pass 0 : shuffle data across machines by value range * Passes 1-n done independently as single node sorting * Data skew: * Choose ranges to get same num of tuples in each partition * Create histogram over random sample of data to divide * Parallel Sort Merge Join * Pass 0 ... n-1: parallel sorting for both R,S (runs for R and S) * Pass n: merge join partitions locally on each node ## Parallel Aggregates * Hierarchical aggregation * Global/local decomposition * Sum(S): Sum the local sums * Count(S): Sum local counts * Avg(S): Sum(S)/Count(S) * Group by * Local aggregation in hash table * Naive Hash Group By * Local aggregation: hash table keyed by group key $k_i$ to keep local agg * Shuffle local aggs by hash function $h_p(k_i)$ * Compute global aggs for each key $k_i$ * Need enough memory to gold $(k_i, agg_i)$ group values pairs ## Pipeline breakers * Sort merge join requires sort to be complete * GHJ can't start probing until hashtable is built * Symmetric Pipeline Hash Join * Each node allocates 2 hash tables, one for each side * Given R, build R hashtable by join key and probe S * Symmetric for given S * Fully streamed algorithm ## Alternate Schemes * One sided shuffle: if R already partitioned, just partition S * Local join at each node and union results * Broadcast join: If R is small, send to all nodes that have a partition of S * Local join at each node and union results # Lecture 14: 3/17/20 ## Concurrency Control * Part 1: Concurrency control * Correct/fast data access * Part 2: Recovery * Fault tolerant, not corrupted by crash * Concurrency advantages * Higher throughput (transactions/sec) * Lower latency (response time per transaction) * Independent of other unrelated transactions * Can write higher level queries (SQL) and guarantee data is processed reliably * Order matters: * Inconsistent reads: User reads in middle of another user's transaction (not a state intended by either user) * Lost update (using preimage): 2 users update record at same time * Dirty reads: A user updates record while another user reads the record ## Transactions * Transaction (xact): A sequenec of multiple actions to be executed as an atomic unit * A sequence of reads and writes of database objects * Application view: * Begin transaction * Sequence of SQL statements * End transaction * Xact manager controls execution of transactions * Program logic is invisible to DBMS * SBMS can only see data reads/writes (not SQL queries) * ACID: * Atomicity: All actions in the xact happen, or none * Ends as a commit: contract w/ caller of the DB * Abort: system crash after executing some actions * Consistency: If DB starts out consistent, ends up consistent at end of xact * Consistency: set of declarative integrity constraints * Like primary keys or foreign keys * Transactions that violate integrity are aborted * Isolation: Execution of each xact is isolated from that of others * DBMS interleaves actions of many xacts * DBMS ensures 2 xacts do not interfere * Each xact executes as if ran by itself * Durability: If xact commits, its effects persist * Effects of a committed xact must survive failures * DBMS logs all actions to undo actions of aborted transactions * Redo actions of committed transactions not yet propagated when system crashes ## Serializability * Transaction schedule: sequence of actions on data from 1 or more transactions * Serial schedule: each transaction runs from start to finish without any intervening actions from other transactions * 2 schedules equivalent if: * Involve same transactions * each individual transaction actions are ordered the same * Both schedules leave the DB in the same final state * Schedule S is serializable if equivalent to a serial schedule ## Conflict Serializability * Tricky to check if leaves DB in same final state * Has some false negatives (sacrifice some concurrency for easier check) * 2 operations conflict if they: * Are by different transactions * On same object * At least 1 of them is a write * Order of non conflicting operations has no effect on final state * Conflict equivalent: 2 schedules that involve the same actions and each pair of conflicting actions is ordered the same way * Conflict serializable: S is conflict equivalent to some serial schedule (S is also serializable) * Can swap consecutive non-conflicting operations of diff transactions * Some serializable schedules are not conflict serializable * Conflict Dependency Graph * One node per xact * Edge from $T_i, T_j$ if operation conflicts from one to other (i has earlier in schedule than j) * $(T_1, T_2)$ means $T_2$ depends on $T_1$ * Schedule is conflict serializable iff graph is acyclic ## View Serializability * Fewer false negatives * View equivalent schedules: * Same initial reads * Same dependent reads * Same winning writes * Blind writes can ignore intermediate writes # Lecture 15: 3/19/20 ## 2PL * 2 Phase locking: Most common scheme, a bit pessimistic * Sets locks for fear of conflict * Acquisition and Release * Rule 1: xact must obtain a shared (S) lock before reading, exclusive (X) lock before writing * 2 objects cannot get a lock on same object except 2 shared * Rule 2: xact cannot get new locks after releasing any lock * Problem: Does not prevent cascading aborts * Lock point: the most number of locks it has * All transactions executed with 2PL are conflict serializable * Strict 2PL * Cascading aborts: rollback of T1 requires rollback of T2 * Strict 2PL requires all locks to be released together when transaction completes (commits or aborts) ## Lock Management * Lock protocol: if your transaction holds a lock and my transaction requests a conflicting lock, I am queued up waiting for that lock * Lock manager maintains hashtable keyed on names of objects being locked * Granted set: set of xacts currently granted acces to the lock * Lock mode: shared or exclusive * Wait queue: quque of lock requests * If no xact in granted set or wait queue with a conflicting lock, put requestor in granted set * Lock upgrade: xact w shared lock can request upgrade to exclusive ### Deadlock * Deadlock: cycle of xacts waiting for each other * Deadlock detection: Waits-for graph, check for cycles * Edge (T1, T2) means T1 waits for T2 * Shoot a transaction on the cycle * Optimistic * Prevention, avoidance, detection and resolution * Most just use timeouts * Prioritize lock upgrades (multiple lock upgrades are unavoidable) * Prevention (OS): resource ordering, but can't impose order on DBMs files * Deadlock Avoidance: Assign priorities based on age * If Ti wants Tj's lock and Ti > Tj * Wait-die: If Ti has higher priority, Ti wait for Tj, else Ti aborts * Wound-wait: If Ti has higher priority, Tj aborts else Ti waits * If a xaction restarts, give it the original timestamp so it will get old * Pessimistic: aborts transactions that may not cause deadlock * Works b/c same priority scheme applies on all lock conflicts, DBMs can roll back transactions, higher priority transactions eventually get access ## Lock Granularity * Lock tuples vs pages vs tables * Fine grained means better concurrency but a lot of locks * Smaller number of locks is easier to keep track of * Multiple locking granularity: transaction explicitly locks a node and implicitly locks all descendents * Solution: New lock modes * Allow xacts to lock at each level with intent locks * Before getting a S or X lock, xact must have proper intent locks on ancestors * IS: Intent to get S lock at finer granularity * IX: Intent to get X lock at finer granularity * SIX: S and IX at the same time * Intention locks allow higher levle node to be locked in S or X mode without having to check all descendents * To get S or IS lock on node, parent node must have Is or IX * To get X or IX on SIX on a node, must have IX or SIX on parent * If locks are compatible, the intent locks must be compatible # Lecture 16: 3/31/20 ## Database Design * Requirements Analysis * Conceptual Design (Object-Relational Mappings) * Logical Design (ER --> DBMS) * Schema Refinement (consistency, normalization) * Physical Design (indexes, disk layout) * Security Design ## Data Models * Data model - collection of concepts for describing data * Schema: description of a particular collection of data * Relational model: * Every relation (table, row, col) has a schema that describes the columns (names, domains) * Levels of abstraction: * Views - how users see data * Conceptual schema - logical structure * Physical schema- describes files and indexes used ### Data Independence * Logical data independence: Maintain views when logical structure changes * Between views and conceptual schema * Physical data independence: Maintain logical structure when physical structure changes * Between conceptual and physical schema ## ER Model * Entity: real world object described by a set of attribute values * Entity set: collection of similar entities * Relationship: Association among 2+ entities * Can have their own attributes * Relationship set: collection of similar relationships * Key constraint: 1-to-many relationship (arrow) * Participation constraint: requires at least 1 (bold edge) * Weak entity: Can be identified uniquely only by considering the primary key of owner entity * Owner entity set and weak entity set must participate in 1-to-many relationship set * Weak entity set must have total participation in the identifying relationship set * Ternary - one relation (diamond) connected to 3 entities (rectangles) * Aggregation - dotted box around a set * Allows relationships to have relationships * ER Models have lots of semantics * Some constraints cannot be captured in ER ## Crow's Foot Notation * $0$: 0 or more * $|$: 1 or more * $|0$: 1 or 0 * $||$: exactly 1 * $<-$: many ![](https://i.imgur.com/FfPpt9e.png) ## Logical Design * ER to Relational * Entity sets --> tables * Relationship sets --> table * Many-to-many * Keys for each participating entity set as foreign keys * Set of atributes form a superkey * All descriptive attributes * Weak entity primary keys are concatenation * On delete cascade - when owner entity is deleted, all owned weak entities must also be deleted # Lecture 17: 4/2/20 * Functional dependencies to avoid redundancy in schemas * Decomposition: split cols of 1 table into 2 tables * $X \rightarrow Y$ (X determines Y) * FD is a constraint enforced by DB * Reject updates that break FD * Not learned from data * Primary keys are special cases of FDs * $K \rightarrow ${all attributes} * Superkey: set of columns that determins all columns in its table (aka a key) * Candidate key: minimal set of columns that determines all columns in table ## Anomalies * Update anomaly - an update makes FD inconsistent * Insertion anomaly - insert a row, but don't know a column so we must invent a value without knowing truth * Deletion anomaly - deleting all rows with a certain value, lose information for lookup table * Chop relation to make a lookup table for X ($X \rightarrow Y$) ## Armstrong's Axioms * $F+$: closure of F * Set of all FDsimplied by F and trivial dependencies * Armstrong's Axioms (complete inference rules) * Reflexivity: If $X \supseteq Y, X \rightarrow Y$ * Augmentation: If $X \rightarrow Y, XZ \rightarrow YZ, \forall Z$ * Transitivity: If $X \rightarrow Y, Y \rightarrow Z \implies X \rightarrow Z$ * Additional rules * Union: If $X \implies Y, X \implies Z, \implies X \rightarrow YZ$ * Decomposition: If $X \rightarrow YZ, \implies X \rightarrow Y, X \rightarrow Z$ * Cannot drop things from both sides ## Attribute Closure * Set of FDs F is hard (exponential in num of attributes) * Typically just check if $X \rightarrow Y$ is in F+ * Procedure * Compute attribute closure of X (X+) wrt F * X+ is set of all attributes A such that $X \rightarrow A$ is in F+ * Set X+ to be the new X * Repeat until no change * Check if Y is in X+ ## Normal Forms * Normal form: * If we know certain problems are avoided/minimized * Helps decide whether decomposing relation is useful * 1st Normal form * Attributes are atomic * Boyce-Codd Normal Form (BCNF) * R with FDs F is in BCNF if $\forall (X \rightarrow A)$ in F+ * $A \subseteq X$ trivial FD aka X is superkey for R * R is in BCNF if the only non-trivial FDs over R are key constraints * Every field is useful info that cannot be inferred via FDs ## Lossless Decompositions * Decomposition - replacing a relation with multiple normalized relations * New relation schemes have a subset of attributes of R * Every attribute of R appears as attribute of 1+ new relations * Problems * 1) Lossy decomposition: impossible to reconstruct original relation * Generate new bad data: $\pi_x(r) \bowtie \pi_y(r) \subseteq r$ * Lossless: $\pi_x(r) \bowtie \pi_y(r) = r$ * Decomp for R into X,Y is lossless wrt F iff $X \cap Y \rightarrow X$ or $X \cap Y \rightarrow Y$ * Basically, intersection of 2 tables must be a key of 1 of the 2 tables * If $X \rightarrow Z, X \cap Z$ is empty, then R to R-Z, XZ is lossless (bc X is a superkey of XZ) * 2) Dependency checking may require joins * 3) Some queries become more expensive ## Dependency Preservation * Projection of $F_X$ is set of Fds $U \rightarrow V$ in F+ (closure of F) * Dependency preserving: $(F_X \cup F_Y)^+ = F^+$ * Decomposition into BCNF * If $X \rightarrow Y$ violates BCNF, decompose into R-Y, XY (lossless) * Repeatedly doing this gives collection of relations that are BCNF * Order of handling can lead to different outcomes * May not always be dependency preserving decomposition into BCNF