# CS186 Final Notes
# Lecture 18: 4/7/20
## Recovery
* Atomicity, Durability
* Roll back changes that violate consistency
* If transactions ongoing on crash, abort them
* SQL has savepoints
* Release savepoint, rollback to savepoint
* Crashes
* Operator error
* Configuration error
* Software failure
* Hardware failure
* Assume strict 2PL
* Simple scheme
* Dirty buffer pages stay pinnd in buffer pool
* At commit, force dirty pass to disk, unpin, then commit
* Doesn't work because buffer pool can fill up, DBMS can crash during unpinning
## Stealing
* NO STEAL policy -
* don't allow buffer pool frames with uncommitted updates to be replaced
* Pinned pages limit buffer replacement
* Need to UNDO
* FORCE policy: Make sure every update is forced to DB before commit
* Durability without REDO logging
* Lots of random IOs
* Need to REDO
* Preferred: Steal/No-Force
* NO FORCE (Durability):
* Flush as little as possible in a convenient place
* REDO modifications upon crash
* STEAL: (Atomicity):
* If xact flushed updates to DB aborts
* System crashes before xact finishes
* Remember old value of flushed pages to UNDO writes
## Logging
* Log: Ordered list of log records to allow REDO/UNDO
* <XID, pageID, offset, length, old data, new data>
* Log Sequence Number (LSN): unique, increasing
* Write-Ahead Logging
* Write to log record before DB disk
* UNDO info for atomicity
* Force all log records for xact before commit
* REDO info for durability
* Write logs into buffer tail in RAM
* flushedLSN: Largest LSN
* Each data page contains a pageLSN with pointer to most recent log record for update to that page
* pageLSN $\leq$ flushedLSN for page to flush to DB
## ARIES Logging
* prevLSN: LSN of previous log record written by XID
* xact records form backwards linked list by xid (for undo)
* Log types: Update, comit, abort, checkpoint, compensation log records, end
* Update records contain physical diff for REDO, UNDO
* Transaction Table (in memory)
* 1 entry per currently active xact
* Removed when xact commits or aborts
* Contains xid, status, lastLSN (most recent LSN written by xact)
* Dirty page table
* 1 entry per dirty page in buffer pool
* recLSN: LSN of log record that first dirtied this page (recovery LSN)
* Normal execution
* Series of reads and writes and then commit/abort
* Assume disk write are atomic, Steal no force buffer management + WAL
* At commit, all log records at xact commit record are flushed to disk
* Guarantees flushedLSN $\geq$ lastLSN
* Abort and checkpointing
* On explicit abort:
* get last LSN, write abort log
* Follow chain of log recs backward to prevLSN
* write CLR for each undone operation
* CLR has an extra field undonextLSN pointing to next LSN to undo
* CLR contains REDO info, never undone
* Checkpointing
* DBMS periodically creats checkpoint
* begin_checkpoint record: indicates when chkpt began
* end_checkpoint record: current xact table dpt
* fuzzy checkpoint
* LSN of most recent chkpt record as master record
## Recovery
* Start at checkpoint from master record
* 3 Phases
* Analysis - which xacts committed since chkpt
* Redo all actions (redoing all history)
* Undo effects of failed xacts
# Lecture 19: 4/14/20
* Distributed computing: unreliable netowrks, unsynch clocks
## Distributed Concurrency Control
* Shared-nothing DBMS
* Usually locks are partitioned with data (nodes manage own lock table independently)
* Assign a home node for the lock across nodes
* Global issues: deadlock, commit/abort
* Distributed Deadlock Detection
* Different waits-for graphs
* Periodically ship wait-for graph to a master node and union all edges for global waits-for graph
* Distributed Commit
* We don't know if the node is alive or dead
* Messages can get interleaved or lost/very delayed
* Distributed voting: nodes need to all agree to commit or abort
## 2-Phase Commit (2PC)
* Phase 1: Coordinator tells participants to prepare
* Participants respond w/ yes/no votes, unanimity required for yes
* No = abort, all yes = commit
* Phase 2: coordinator disseminates result of vote
* Participants respond with Ack
* When coordinator receives all Ack, transaction is complete
* Requires logging for failure handling
* 2PC with logging
* Flush log tail with prepare(T1) and then can respond yes/no
* When coordinator generates and flushes commit record, done with phase1
* Participants log and flush commit/abort record and respond with ack
* Coordinator generates end record and flushes
* Coordinator knows which transactions committed from log

## Recovery
* If xact table at coordinator says abort/commit, send the appropriate response and continue protocl
* If xact table at coordinator does not have anything, send abort
* (coordinator had crashed before writing commit/abort and participant crashed)
* Coordinator recovery
* Commit and end: nothing
* commit: rerun phsae2
* abort: nothing
* Participant recovery
* Nothing: nothing (presumed abort)
* Pepare & commit: Send ack to coordinator
* Prepare: send inquiry to coordinator
* Abort: nothing (presumed abort)
## 2PC and Strict 2PL
* Densely ordered point to point messages
* Receiver can detect out of order and buffer messages
* Commit:
* When participant processes commit req, has all the locks already
* flush log records and drop locks atomically
* Abort:
* Abort locally autonomously, no cascade
* Log appropriately to 2PC and local undo, dropping locks atomically
* Availability
* If a node goes down, other nodes are in limbo making data unavailable
* If don't hear from participant, respawn participant on new hardware and recover from log as a clone
* Dead coordinator: 2PC can't deal with this