# CS186 Final Notes # Lecture 18: 4/7/20 ## Recovery * Atomicity, Durability * Roll back changes that violate consistency * If transactions ongoing on crash, abort them * SQL has savepoints * Release savepoint, rollback to savepoint * Crashes * Operator error * Configuration error * Software failure * Hardware failure * Assume strict 2PL * Simple scheme * Dirty buffer pages stay pinnd in buffer pool * At commit, force dirty pass to disk, unpin, then commit * Doesn't work because buffer pool can fill up, DBMS can crash during unpinning ## Stealing * NO STEAL policy - * don't allow buffer pool frames with uncommitted updates to be replaced * Pinned pages limit buffer replacement * Need to UNDO * FORCE policy: Make sure every update is forced to DB before commit * Durability without REDO logging * Lots of random IOs * Need to REDO * Preferred: Steal/No-Force * NO FORCE (Durability): * Flush as little as possible in a convenient place * REDO modifications upon crash * STEAL: (Atomicity): * If xact flushed updates to DB aborts * System crashes before xact finishes * Remember old value of flushed pages to UNDO writes ## Logging * Log: Ordered list of log records to allow REDO/UNDO * <XID, pageID, offset, length, old data, new data> * Log Sequence Number (LSN): unique, increasing * Write-Ahead Logging * Write to log record before DB disk * UNDO info for atomicity * Force all log records for xact before commit * REDO info for durability * Write logs into buffer tail in RAM * flushedLSN: Largest LSN * Each data page contains a pageLSN with pointer to most recent log record for update to that page * pageLSN $\leq$ flushedLSN for page to flush to DB ## ARIES Logging * prevLSN: LSN of previous log record written by XID * xact records form backwards linked list by xid (for undo) * Log types: Update, comit, abort, checkpoint, compensation log records, end * Update records contain physical diff for REDO, UNDO * Transaction Table (in memory) * 1 entry per currently active xact * Removed when xact commits or aborts * Contains xid, status, lastLSN (most recent LSN written by xact) * Dirty page table * 1 entry per dirty page in buffer pool * recLSN: LSN of log record that first dirtied this page (recovery LSN) * Normal execution * Series of reads and writes and then commit/abort * Assume disk write are atomic, Steal no force buffer management + WAL * At commit, all log records at xact commit record are flushed to disk * Guarantees flushedLSN $\geq$ lastLSN * Abort and checkpointing * On explicit abort: * get last LSN, write abort log * Follow chain of log recs backward to prevLSN * write CLR for each undone operation * CLR has an extra field undonextLSN pointing to next LSN to undo * CLR contains REDO info, never undone * Checkpointing * DBMS periodically creats checkpoint * begin_checkpoint record: indicates when chkpt began * end_checkpoint record: current xact table dpt * fuzzy checkpoint * LSN of most recent chkpt record as master record ## Recovery * Start at checkpoint from master record * 3 Phases * Analysis - which xacts committed since chkpt * Redo all actions (redoing all history) * Undo effects of failed xacts # Lecture 19: 4/14/20 * Distributed computing: unreliable netowrks, unsynch clocks ## Distributed Concurrency Control * Shared-nothing DBMS * Usually locks are partitioned with data (nodes manage own lock table independently) * Assign a home node for the lock across nodes * Global issues: deadlock, commit/abort * Distributed Deadlock Detection * Different waits-for graphs * Periodically ship wait-for graph to a master node and union all edges for global waits-for graph * Distributed Commit * We don't know if the node is alive or dead * Messages can get interleaved or lost/very delayed * Distributed voting: nodes need to all agree to commit or abort ## 2-Phase Commit (2PC) * Phase 1: Coordinator tells participants to prepare * Participants respond w/ yes/no votes, unanimity required for yes * No = abort, all yes = commit * Phase 2: coordinator disseminates result of vote * Participants respond with Ack * When coordinator receives all Ack, transaction is complete * Requires logging for failure handling * 2PC with logging * Flush log tail with prepare(T1) and then can respond yes/no * When coordinator generates and flushes commit record, done with phase1 * Participants log and flush commit/abort record and respond with ack * Coordinator generates end record and flushes * Coordinator knows which transactions committed from log ![](https://i.imgur.com/S5wBhCW.png) ## Recovery * If xact table at coordinator says abort/commit, send the appropriate response and continue protocl * If xact table at coordinator does not have anything, send abort * (coordinator had crashed before writing commit/abort and participant crashed) * Coordinator recovery * Commit and end: nothing * commit: rerun phsae2 * abort: nothing * Participant recovery * Nothing: nothing (presumed abort) * Pepare & commit: Send ack to coordinator * Prepare: send inquiry to coordinator * Abort: nothing (presumed abort) ## 2PC and Strict 2PL * Densely ordered point to point messages * Receiver can detect out of order and buffer messages * Commit: * When participant processes commit req, has all the locks already * flush log records and drop locks atomically * Abort: * Abort locally autonomously, no cascade * Log appropriately to 2PC and local undo, dropping locks atomically * Availability * If a node goes down, other nodes are in limbo making data unavailable * If don't hear from participant, respawn participant on new hardware and recover from log as a clone * Dead coordinator: 2PC can't deal with this