DBMS Unit 3 - HackMD

## Guidelines for Design of Schemas #### Design Theory Grouping of attributes together to form good relational schema. ### Informal Guidelines * **Semantics** of attributes is clear * Reducing **redundant** information * Reducing **NULL** values * Disallowing possibility of generating **spurious tuples** * Tuples that arise on wrong join of tables (joining on attributes that are not pk or fk) ### Features Of a Good Relational Schema #### **Guideline 1: Easy Interpretability** Each tuple in a relation should represent one entity or relationship instance. * Attributes of different entities shouldn't be mixed in same relation. * Only foreign keys should be used to refer to other entities * Entity and relationship attributes should be kept apart as much as possible. #### **Guideline 2: Must not suffer from anomalies** * Insert anomaly * Update anomaly * Delete anomaly #### **Guideline 3: Avoid usage of NULL values** * Frequently NULL attributes can be placed in different relation with pk. * Leads to wastage of space and issues with understanding the attributes #### **Guideline 4: Lossless Join Condition must be satisfied** * No spurious tuples must be generated on natural join of any relations. ![image](https://hackmd.io/_uploads/Sklg6MELp.png) * Decompositions help avoid redundancies (but not always good) * Lossy Decomposition : Cannot reconstruct original relation (from decomposed relations) ![image](https://hackmd.io/_uploads/Byvd0GEUp.png) ## Functional Dependencies - Used to specify formal measures of Goodness of relational designs - keys are used to define **normal forms** - Constraints are derived from the meaning and interrelations of the data attributes *A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y i.e X -> Y* An FD is a property of the attribute in a schema FDs dont exist if there are tuples that show a violation of those dependencies ![image](https://hackmd.io/_uploads/BJa3PBxHp.png) ![image](https://hackmd.io/_uploads/BJiOKBlHa.png) Prime attribute: Member of candidate key. Non Prime Attribute: Not a member of candidate key. ### Types of FD #### **Full FD** * Determinant of FD is candidate key or super key * Dependent can be prime or non prime ![image](https://hackmd.io/_uploads/Sk1sgXVLa.png) #### **Partial FD** * Non prime attribute can be derived by only part of candidate key. ![image](https://hackmd.io/_uploads/S1dTb7V8T.png) #### **Transitive FD** * Attribute is functionally dependent on a non key attribute which is in turn dependent on candidate key. ![image](https://hackmd.io/_uploads/SJ3M-QVUp.png) #### **Trivial FD** X is a set of attributes, and Y is subset of X then X→Y holds. ABC→BC #### **Multi Valued FD** If for each value of X, there is a well-defined set of values Y and Well-defined set of values of Z and set of values of Y is independent of the set values of Z. X →Y / Z ### Armstrong's Inference Rules - IR1 **(Reflexive)** If **Y** $\subseteq$ **X**, then **X**$\rightarrow$**Y** always holds. - IR2 **(Augmentation)** If **X**$\rightarrow$**Y**, then **XZ**$\rightarrow$**YZ** always holds - IR3 **(Transitive)** if **X**$\rightarrow$**Y** and **Y**$\rightarrow$**Z** then **X**$\rightarrow$**Z** always holds. ### Additional Inference Rules - **Decomposition** if **X**$\rightarrow$**YZ**, then **X**$\rightarrow$**Y** and **X**$\rightarrow$**Z** always holds - **Union** if **X**$\rightarrow$**Y** and **X**$\rightarrow$**Z** then **X**$\rightarrow$**YZ** (*Inverse of decomposition*) - **Psuedotransitivity** If **X**$\rightarrow$**Y** and **WY**$\rightarrow$**Z**, then **WX**$\rightarrow$**ZX** ![image](https://hackmd.io/_uploads/rJelHQV8a.png) ![image](https://hackmd.io/_uploads/ByrK3SgH6.png) --- ### **Closure** - Set of FDs **F** has **F$^+$**, which is the set of FDs that can be inferred from F - Closure of a attributes X *w.r.t* F is the set X<sup>+</sup> of all attributes that are fucntionally determined by X *X$^+$ can be derived by applying Inference rules on X* ![image](https://hackmd.io/_uploads/rkTWvmV8a.png) ![image](https://hackmd.io/_uploads/SyhmvX486.png) *The closure of { Classid }+ is the entire relation CLASS indicating that all attributes of the relation can be determined from Classid and hence it is a candidate key.* ![image](https://hackmd.io/_uploads/B1GHqXEIT.png) ### **Equivalence** E+ = F+ E covers F , F covers E ![image](https://hackmd.io/_uploads/rkSBLNNUT.png) ### **Covers** F covers G if every FD in G can be inferred from F *(F $\supseteq$ G)* #### Minimal cover - Minimal set of dependencies to achieve closure in that set - Irreducible set of functional dependency **FINDING MINIMAL COVER** * Replace with each individual output, **cannonical form**( Decomposition ) - Rewrite A $\rightarrow$ BC as A $\rightarrow$ B, A $\rightarrow$ C * Remove **redundant functional dependency** (Redundant Dependency) - In a set of FDs F (ex. A $\rightarrow$ B), check if *A$^+$ with F == A$^+$ without F* * Remove redundant attribute ( **Extraneous Attribute** ) - If AB $\rightarrow$ C, if A$^+$ == AB$^+$ then B is extraneous, remove B and vice versa - *Shortcut* : In AB $\rightarrow$ C, If A $\rightarrow$ B then remove B and vice versa ![image](https://hackmd.io/_uploads/HyxY7VE8T.png) --- ## Keys **_Key is any attribute which upon closure derives every attribute_** ### Candidate Key * A set of minimal attribute(s) that can identify each tuple uniquely in the given relation. * A minimal super key is called as a candidate key. #### Finding Candidate Key 1. Attributes not in RHS are essential attributes and always part of candidate keys 2. If only essential attributes **can** determine all attributes, its the **only possible candidate key**. 3. If only essential attributes **cannot** determine all attributes, a **set of essential and non essential** attributes form the candidate key. ( **Multiple candidate keys** are possible by taking different combinations) ![image](https://hackmd.io/_uploads/rku5YEVIp.png) #### Example with mulltiple candidate keys ![image](https://hackmd.io/_uploads/SJW3f1X8T.png) ### Total Number of Super Keys Lets say Candidate Key of the relation set ABCDEF is AB. Then there are four possible blanks to fill Each blank may or may not be filled in. ( 2 possibilities ) Thus we get 2x2x2x2 = 16 super keys $No.\ of\ Super\ keys = 2^{Non\ candidate\ key\ attributes}$ [Examples for candidate key finding](https://www.gatevidyalay.com/candidate-key-finding-candidate-key-examples/) #### Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations #### Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form ![image](https://hackmd.io/_uploads/BytUo_lHa.png) --- ## Normalization Designing relational schema such that it allows information storage **without redundancy** and facilitates easy **information retreival**. **2NF, 3NF, BCNF** based on keys and FDs of a relation schema **4NF** based on keys, multi-valued dependencies : MVDs; **5NF** based on keys, join dependencies : JDs ### 1-NF - Atomic values: Each cell in the table must contain a single, indivisible value i.e should not contain multiple values or a list of values - Unique column names Enables easy retireval and extensibility *BASIC: Column data must be homogenous and Column name should be unique* - **No multi value columns** - No composite attribute - No Nested relations ![image](https://hackmd.io/_uploads/S1zKROxS6.png) ![image](https://hackmd.io/_uploads/H15uyFeHT.png) *Here Pnumber not dependent on ssn* > If you make, ssn pnumber the primary key and then create a separate entry for every pnumber, it is still 1NF, but if you split it into two different relations like in $c$ it may be better than 1NF * We should take care not to put any multivalued elements in a row * Composite values should be present as different columns * Nested Relations - You cant have a table inside a table #### Conditions **no table column can have tables as values** --- ### 2-NF **Prime Attribute** An attribute that is a member of the primary key K **Fully Functional dependancy** An FD X->Y where removal of any attribute from X destroys the FD. #### Conditions 1. It is in first normal form. 1. **Every non prime attribute is fully functionally dependent on candidate key** (no partial dependencies) >*partial dependency occurs when a non-key attribute in a table is functionally dependent on only a part of the candidate key, not the entire key* ![image](https://hackmd.io/_uploads/B1CPEtgS6.png) * Now if i wanna query ename, just look at Ssn no need Pnumber. * Split based on fully functional dependency ![image](https://hackmd.io/_uploads/ByXZXBVIa.png) --- ### 3-NF #### Conditions * Must be in 2 NF * **Non prime attributes cannot have transitive dependency on candidate key** ![image](https://hackmd.io/_uploads/SJ9xBYlHp.png) * this example is already in 2nf, to make it 3nf we are splitting the transitive relation so if we have a dnumber,we dont have to linear search(if it is primary key we dont have to linear search,only index) *If Dnumber wasnt dependent on Ssn, then its a table within a table, a nested relation.* ![image](https://hackmd.io/_uploads/SkIhIxmUa.png) ![image](https://hackmd.io/_uploads/rJY7PBE86.png) --- ![image](https://hackmd.io/_uploads/H1fPcKxH6.png) ## BCNF **all functional dependencies(non trivial) based on superkeys(determinant is superkey)** #### Condition * Must be in 3 NF * For any dependency A $\rightarrow$ B , A must be a super key * **If a non prime attribute derives a prime attribute, it violates BCNF** ![image](https://hackmd.io/_uploads/HJsM2ZQLp.png) ![image](https://hackmd.io/_uploads/ryJ42Z7Ia.png) ![image](https://hackmd.io/_uploads/SkmQFUV8T.png) ![image](https://hackmd.io/_uploads/ry3P2ZXI6.png) **NJB TEST** - If intersection of two tables implies the set difference it satisfies BCNF You can avoid NJB test by having teach table - course(b), and course union instructor ( c ), given that instructor -> course. If either not in bcnf, repeat. * **3 NF always ensures dependency preservation for BCNF does not.** * **3 NF and BCNF ensure lossless decomposition** ## Other Normal Forms ### 4-NF When there are many independant multi-valued dependancies, they should be put into separate tables. Violation of 4-NF is usually due to bad database design. This ia called **Multivalued Dependancy** ### 5-NF Table is in 5-NF when it cannot be split further without loss of information. This is called **Join dependancy** --- # Transactions A transaction is a unit of program execution that accesses and possibly updates various data items. Two main issues to deal with: • **Failures** of various kinds, such as hardware failures and system crashes • **Concurrent execution** of multiple transactions #### Begin and end transaction statements Specify transaction boundaries, all operations between is a part on single transaction. * No explicit begin required by SQL * Explicit end required * COMMIT * ROLLBACK #### Access Modes ● Read-only transaction :If the database operations in a transaction do not update the database. ● Read-write transaction: If the database operations in a transaction update the database. As well as retrieve. ## Acid A transaction must maintain * **Atomicity**: all of none (retstart if fail,transactions atomic) * If the transaction does not complete, the db system restores the old values from the log to make it appear as though the transaction never executed. * **Consistency**: db in consistent state * Enforced using contraints * **Isolation**: parallel to serial * Concurrently executing transactions must not be aware of other transactions i.e intermediate results must be hidden * **Durability**: changes,state etc are persistent even on failure * Using log (recovery system) ![image](https://hackmd.io/_uploads/SyCYqUVLT.png) ![image](https://hackmd.io/_uploads/rJ5oooeB6.png) ![image](https://hackmd.io/_uploads/Hk0vnjgHp.png) ![image](https://hackmd.io/_uploads/HJVh2olS6.png) *Recovery system* ### **Transaction states** * **Active**: get to ram for exec * the initial state; the transaction stays in this state while it is executing * **Partially commit**: operation done ->not committed yet(RAM) * after the final statement has been executed. * **Commited**: Exist in HD * after successful completion. * **Termination**: Free help resources * Transaction has either committed or aborted. * **Failed**: a failed state * after the discovery that normal execution can no longer proceed. * **Abort**: kill/restart * after the transaction has been rolled back and the database restored to its prior state ![image](https://hackmd.io/_uploads/HkL4yhlB6.png) * A transaction starts in the active state. When it finishes its final statement, it enters the partially committed state. * Hardware failure may stop successful completion of transaction even after execution is completed. ### Concurrent Executions Multiple transactions are allowed to run concurrently in the system. #### Advantages: * Increased processor and disk utilization, leading to better transaction throughput * Ex: one transaction can be using the CPU while another is reading from or writing to the disk * Reduced average response time for transactions: short transactions need not wait behind #### Schedule Sequence of instructions that specify the chronological order of concurrent transactions to be executed. #### Conflicting operations in a schedule Two operations in a schedule are said to conflict if they satisfy all three of the following conditions: * Belong to different transactions * Access the same item X * At least one of the operations is a write_item(X). #### Serial Schedule * Transactions execute serially * Consistent * Only 1 transaction is active at a time ![image](https://hackmd.io/_uploads/BJouXv4LT.png) #### Non Serial Schedule * Multiple transactions execute concurrently. * NOT always Consistent ![image](https://hackmd.io/_uploads/HkOjXw48T.png) ### Serializable Schedule **A transaction schedule is serializable if its outcome (e.g., the resulting database state) is equal to the outcome of its transactions executed serially.** ![image](https://hackmd.io/_uploads/B17Yl3xS6.png) ![image](https://hackmd.io/_uploads/HJ6dfnlrp.png) **Conflict Serializable** - If the precedence graph has a loop,it is not conflict serializable * We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule. * Conflict equivalent: Relative order of any two conflicting operations is the same in both schedules. **View Serializable** - If there exists a view which is conflict serializable (where a view is any equivalent schedule to the original) it is view serializable i.e it is view equivalent. * Two schedules are said to be view equivalent if the order of initial read, final write and update operations is the same in both the schedules. * Can swap around conflicts as well as long as end state of db is same. ![image](https://hackmd.io/_uploads/rJirnvNLa.png) ![image](https://hackmd.io/_uploads/HkjaAv4Up.png) #### Blind Write If there is no read that happens prior to the first write then it is said to be a blind write. ![image](https://hackmd.io/_uploads/SkB_CvVUp.png) W3(X) is a blind write, W2(X) is not. ### Recoverable Schedule If a transaction Tj reads a data item previously written by a transaction Ti , then the commit operation of Ti appears before the commit operation of Tj. ![image](https://hackmd.io/_uploads/HkCBW_NLT.png) * If T6 should abort, T7 would have read (and possibly shown to the user) an inconsistent database state. * Hence, database must ensure that schedules are recoverable. ![image](https://hackmd.io/_uploads/BJ75-O4La.png) ![image](https://hackmd.io/_uploads/H1gp-uNIp.png) **Non-serializable schedules** - may or may not be consistent - may or may not be recoverable ### Irrecoverable Schedule Transaction performs a dirty read operation from an uncommitted transaction and commits before the transaction from which it has read the value then such a schedule is known as an Irrecoverable Schedule. ![image](https://hackmd.io/_uploads/SJkl7uE8a.png) ### Checking Whether a Schedule is Recoverable or Irrecoverable #### Step 1 Conflict serializable : Recoverable Not conflict seralizable: Continue #### Step 2 No dirty read operation: Recoverable Dirty read operation: Continue #### Step 3 1. Find the order of commit transaction from the schedule. 2. Find the order of dirty read transaction from the schedule. If order matches then its a recoverable schedule else irrecoverable. ![image](https://hackmd.io/_uploads/H1BxwuNLT.png) ### Types of Irrecoverable Schedules #### Cascading Schedule Failure of one transaction causes several other dependent transactions to rollback or abort, then such a schedule is called as a Cascading Schedule or Cascading Rollback or Cascading Abort. ![image](https://hackmd.io/_uploads/r13cDuVLp.png) #### Cascadeless Schedule Cascadeless schedule allows only committed read operations but allows uncommitted writes. * Avoids cascading roll back and thus saves CPU time. ![image](https://hackmd.io/_uploads/Skzyu_48T.png) #### Strict Schedule Strict schedule allows only committed read and write operations. --- ## Concurrency Control Procedure of managing simultaneous transactions ensuring their atomicity, isolation, consistency, and serializability. ### Lock-Based Protocols * Allow a transaction to access a data item only if it is currently holding a lock on that item, ensures no other transaction is allowed to aquire it: only one transaction can execute at a time. * A transaction acquires a lock on the entire database before it starts and releases the lock after it has committed. * Poor performance #### Lock * Mechanism to control concurrent access to a data item. * Lock requests are made to concurrency-control manager. Transaction can proceed only after request is granted. * Modes * Exclusive (X) mode : Data item can be both **read and written**. X-lock is requested using lock-X instruction. * Shared (S) mode. Data item can **only be read**. S-lock is requested using lock-S instruction. ![image](https://hackmd.io/_uploads/H1CjV-WHT.png) *This function is used to define whether a pair of lock modes can peacefully coexist on a shared resource or will result in blocking.* #### How does it work? To access a data item, transaction Ti must first lock that item. If the data item is already locked by another transaction in an incompatible mode, the concurrency-control manager will not grant the lock until all incompatible locks held by other transactions have been released. Thus, **Ti is made to wait until all incompatible locks held by other transactions have been released.** ### Shared / Exclusive Lock ![image](https://hackmd.io/_uploads/SJed_dEUT.png) ## Two phase Locking - Growing Phase - Can obtain lock - Can convert shared to Exclusive - Shrinking Phase - Can release lock - Can convert Exclusive to shared ![image](https://hackmd.io/_uploads/S1qKd_NLa.png) Two-phase locking does not ensure freedom from deadlocks. ### Strict 2PL - Prevents cascading and irrecoverability - Holds Exclusive Locks(X) until commit ### Rigorous 2PL - Holds Exclusive (X) AND Shared Locks (S) until commit ## Implementation of Locking * **Lock manager** can be implemented as a process that **receives messages** from transactions and **sends messages** in reply. * Transactions can send lock and unlock requests as messages. * Lock manager replies to request by sending **lock grant message or rollback message** (in case of deadlock) * The lock manager maintains an in-memory data-structure called a **lock table** to record granted locks and pending requests. * maintains a **linked list** of records, one for each request, in the order in which the requests arrived. * Each record contains transaction that made request, lock mode requested and if it has been granted * It uses a **hash table**, indexed on the name of a data item, to find the linked list (if any) for a data item ![image](https://hackmd.io/_uploads/ByoKOKN8T.png) * New request added at end of queue, granted if compatible * Unlock request: deleted request * If transaction aborts, all waiting or granted requests of the transaction are deleted. * Maintains indexing of transactions * This algorithm guarantees **freedom from starvation** for lock requests, since a request can never be granted while a request received earlier is waiting to be granted --- if we wish to develop protocols that are not two phase, we need additional information on how each transaction will access the database. There are various models that can give us the additional information, each differing in the amount of information provided. The simplest model requires that we have prior knowledge about the order in which the database items will be accessed. Given such information, it is possible to construct locking protocols that are not two phase, but that, yet, ensure conflict serializability ### Graph based Locking ![image](https://hackmd.io/_uploads/HkO_MPNIp.png) ### Tree Based Locking ![image](https://hackmd.io/_uploads/ryhYxSEUT.png) ![image](https://hackmd.io/_uploads/rJn_xS4LT.png) ## Deadlock A system is in a deadlock state if there exists a set of transactions such that every transaction in the set is waiting for another transaction in the set. #### Methods to deal with deadlocks * Deadlock prevention: Ensure system doesn't enter in deadlock state * Deadlock recovery: Allow deadlock to happen and try to recover by deadlock detection and recovery schemes. One of the two transactions must rollback. ![image](https://hackmd.io/_uploads/BJb6EF4I6.png) #### Starvation Multiple transactions keep requesting for a compatible lock which is granted but the request made for uncompatible lock is put on hold indefinitely causing **starvation** despite repeated requests. ## Deadlock Prevention Schemes #### Some deadlock prevention schemes * Each transaction locks all its data items before it begins execution * Impose partial ordering of all data items and require that a transaction can lock data items only in the order specified by the partial order (graph-based protocol). * Performs transaction rollback instead of waiting for a lock whenever the wait could potentially result in a deadlock. #### Drawbacks * It is often hard to predict, before the transaction begins, what data items need to be locked. * Data-item utilization may be very low, since many of the data items may be locked but unused for a long time. ![image](https://hackmd.io/_uploads/Sk21-fWrT.png) ![image](https://hackmd.io/_uploads/HJlZCYNUa.png) ![image](https://hackmd.io/_uploads/rJWEbz-ST.png) ## Deadlock Detection ![image](https://hackmd.io/_uploads/HJ8KWMZrp.png) ![image](https://hackmd.io/_uploads/r15KCYVLT.png) *If deadlocks occur frequently, then the detection algorithm should be invoked more frequently.* ## Deadlock Recovery - Select Victim - Select based on minimum cost to rollback - Rollback - **Total** : Abort full transaction - **Partial** : Abort transaction until it gives up lock that another transaction needs - Starvation - Oldest transaction never chosen as victim ## RECOVERY SYSTEM - A computer system, like any other device, is subject to failure from a variety of causes such as disk crashes, power outages, software errors, a fire in the machine room, and even sabotage. - In any failure, information may be lost. Therefore, the database system must take action in advance to preserve the atomicity and durability properties of transactions. - An integral part of a database system is a recovery scheme that can **restore the database to the consistent state that existed before the failure**. - The recovery scheme must support **high availability** by keeping a **synchronized backup copy** of the database for use in case of machine failure or maintenance. ## FAILURE CLASSIFICATION There are various types of failure that may occur in a system, each of which needs to be dealt with in a different manner. In this chapter, we shall consider only the following types of failure: 1. **Transaction Failure** 2. **System Crash** 3. **Disk Failure** ### TRANSACTION FAILURE #### 1) Transaction Failure: There are two types of errors that may cause a transaction to fail: :- **Logical error**: The transaction can no longer continue with its normal execution because of some internal condition, such as bad input, data not found, overflow, or resource limit exceeded. - **System error**: The system has entered an undesirable state (e.g., deadlock), as a result of which a transaction cannot continue with its normal execution. The transaction, however, can be re-executed at a later time. ### SYSTEM CRASH #### System Crash: - A power failure or other hardware or software failure causes the system to crash. - There are three potential causes for the loss of volatile storage and interruption of transaction processing: hardware malfunction, database software bug, or operating system issue. However, the content of non-volatile storage remains unaffected. - **Fail-stop assumption**: The assumption that hardware errors and bugs in the software bring the system to a halt, but **do not corrupt the non-volatile storage contents**, is known as the fail-stop assumption. - Well-designed systems have numerous internal checks, at the hardware and the software level, that bring the system to a halt when there is an error. - Hence, the fail-stop assumption is a reasonable one. ### DISK FAILURE #### Disk Failure: - A head crash or similar disk failure destroys all or part of the disk storage. - A disk block loses its content as a result of either a head crash or failure during a data-transfer operation. **Copies of the data on other disks, or archival backups on tertiary media**, such as DVDs or tapes, are used to recover from the failure. - Destruction is assumed to be detectable: **disk drives use checksums to detect failures**. - To determine how the system should recover from failures, we need to identify the failure modes of those devices used for storing data. - Next, we must consider how these failure modes affect the contents of the database. - We can then propose algorithms to ensure database consistency and transaction atomicity despite failures. - These algorithms, known as recovery algorithms, have two parts: 1. Actions are taken during normal transaction processing to ensure that enough information exists to allow recovery from failures. 2. Actions taken after a failure to recover the database contents to a state that ensures database consistency, transaction atomicity, and durability. ## STORAGE ### Storage - The various data items in the database may be stored and accessed in a number of different storage media. We know that storage media can be distinguished by their relative speed, capacity, and resilience against failure. - We identified three categories of storage: 1. **Volatile storage** 2. **Non-Volatile storage** 3. **Stable storage** - Stable storage or, more accurately, an approximation thereof, plays a critical role in recovery algorithms. - **Volatile storage**: - Does not survive system crashes. - Examples: main memory, cache memory. - **Non-volatile storage**: - Survives system crashes. - Examples: disk, tape, flash memory, non-volatile RAM. - But may still fail, losing data. - **Stable storage**: - A mythical form of storage that survives all failures. - Approximated by maintaining multiple copies on distinct non-volatile media. ##### Stable-Storage Implementation - Maintain multiple copies of each block on separate disks. - Copies can be at remote sites to protect against disasters such as fire or flooding. - Failure during data transfer can still result in inconsistent copies: - **Successful completion**: The transferred information arrived safely at its destination. - **Partial failure**: A failure occurred in the midst of a transfer, and the destination block has incorrect information. - **Total failure**: The failure occurred sufficiently early during the transfer that the destination block remains intact (destination block was never updated). - Protecting storage media from failure during data transfer (one solution): 1. Write the information onto the first physical block. 2. When the first write successfully completes, write the same information onto the second physical block. 3. The output is completed only after the second write successfully completes. - Copies of a block may differ due to failure during the output operation. - To recover from failure: 1. First, find inconsistent blocks: - Expensive solution: Compare the two copies of every disk block. - Better solution: - Record in-progress disk writes on non-volatile storage (Flash, Non-volatile RAM, or special area of disk). - Use this information during recovery to find blocks that may be inconsistent, and only compare copies of these. - Used in hardware RAID systems. 2. If either copy of an inconsistent block is detected to have an error (bad checksum), overwrite it with the other copy. If both have no errors but are different, overwrite the second block by the first block. ### Data Access - Physical blocks are those blocks residing on the disk. - Buffer blocks are the blocks residing temporarily in the main memory. - Block movements between disk and main memory are initiated through the following two operations: - `input(A)` transfers the physical block A to the main memory. - `output(B)` transfers the buffer block B to the disk and replaces the appropriate physical block there. ![image](https://hackmd.io/_uploads/BJ3Heq4I6.png) - We assume, for simplicity, that each data item fits in, and is stored inside a single block. - Each transaction Ti has its private work area in which local copies of all data items accessed and updated by it are kept. - Ti's local copy of a data item X is called xi. - Transferring data items between system buffer blocks and their private work area done by: - `read(X)` assigns the value of data item X to the local variable xi. - `write(X)` assigns the value of local variable xi to data item {X} in the buffer block. - Note: `output(BX)` need not immediately follow `write(X)`. The system can perform the output operation when it deems fit. - Transactions - Must perform `read(X)` before accessing X for the first time (subsequent reads can be from a local copy). - `write(X)` can be executed at any time before the transaction commits. ![image](https://hackmd.io/_uploads/SJNeXu-Sp.png) ## Recovery and Atomicity - To ensure atomicity despite failures, we first output information describing the modifications to stable storage without modifying the database itself. ## Shadow-Copy A duplicate snapshot of the entire database at a specific moment, offering consistency for backup and recovery purposes during transactions. ## Shadow Paging Selective duplication of modified pages in a snapshot, minimizing storage usage compared to copying the entire database. ## Log Records - The most widely used structure for recording database modifications is the log. The log is a sequence of log records, recording all the update activities in the database. - There are several types of log records. An update log record describes a single database write. It has these fields: - Transaction identifier, which is the unique identifier of the transaction that performed the write operation. - Data-item identifier, which is the unique identifier of the data item written. Typically, it is the location on disk of the data item, consisting of the block identifier of the block on which the data item resides and an offset within the block. - Old value, which is the value of the data item prior to the write. - New value, which is the value that the data item will have after the write. - We represent an update log record as `<Ti, Xj, V1, V2>`, indicating that transaction Ti has performed a write on data item Xj. Xj had value V1 before the write and has value V2 after the write. Other special log records exist to record significant events during transaction processing, such as the start of a transaction and the commit or abort of a transaction. Among the types of log records are: - `<Ti start>`. Transaction Ti has started. - `<Ti commit>`. Transaction Ti has committed. - `<Ti abort>`. Transaction Ti has aborted. ## Log-Based Recovery - A log is a sequence of log records. The records keep information about updated activities on the database. - The log is kept in stable storage. - When transaction Ti starts, it registers itself by writing a `<Ti start>` log record. - Before Ti executes write(X), a log record `<T, X, V1, V2>` is written, where V1 is the value of X before the write (the old value), and V2 is the value to be written to X (the new value). - When Ti finishes its last statement, the log record `<Ti commit>` is written. #### Two approaches using logs: - Immediate database modification - Deferred database modification. ### Immediate database Log recovery - Log messages are read backwards until the last checkpoint and all the old values are restored, and an abort ## Database Modification - The log records allow the system to undo changes made by a transaction in the event that the transaction must be aborted; they allow the system also to redo changes made by a transaction if the transaction has committed but the system crashed before those changes could be stored in the database on disk. - In order for us to understand the role of these log records in recovery, we need to consider the steps a transaction takes in modifying a data item: 1. The transaction performs some computations in its own private part of main memory. 2. The transaction modifies the data block in the disk buffer in main memory holding the data item. 3. The database system executes the output operation that writes the data block to disk. ## Immediate Database Modification ![image](https://hackmd.io/_uploads/Hyw3VmX86.png) - The immediate-modification scheme allows updates of an uncommitted transaction to be made to the buffer or the disk itself before the transaction commits. - Update log record must be written before the database item is written. - We assume that the log record is output directly to stable storage. - Output of updated blocks to disk can take place at any time before or after the transaction commit. - Order in which blocks are output can be different from the order in which they are written. ## Deferred Database Modification - The deferred database modification scheme delays the update to disk or buffer until the transaction commits. - Simplifies some aspects of recovery - But has overhead of storing a local copy ![image](https://hackmd.io/_uploads/rylUn8XmI6.png) ## Recovery Algorithm - A recovery algorithm must take into account a variety of factors, including: - The possibility that a transaction may have committed although some of its database modifications exist only in the disk buffer in main memory and not in the database on disk. - The possibility that a transaction may have modified the database while in the active state and, as a result of a subsequent failure, may need to abort. - Because all database modifications must be preceded by the creation of a log record, the system has available both the old value prior to the modification of the data item and the new value that is to be written for the data item. This allows the system to perform undo and redo operations as appropriate. - **The undo operation** using a log record sets the data item specified in the log record to the old value contained in the log record. - **The redo operation** using a log record sets the data item specified in the log record to the new value contained in the log record.(in case it hasnt updated in the db) ## Concurrency Control and Recovery - *With concurrent transactions, all transactions share a single disk buffer and a single log.* - A buffer block can have data items updated by one or more transactions. - **We assume that if a transaction Ti has modified an item, no other transaction can modify the same item until Ti has committed or aborted.** (the updates of uncommitted transactions should not be visible to other transactions.) - Can be ensured by obtaining exclusive locks on updated items and holding the locks till the end of the transaction (strict two-phase locking). - Log records of different transactions may be interspersed in the log. ## Transaction Commit - A transaction is said to have **committed** when its **commit log record** is output to **stable storage.** - All previous log records of the transaction must have been output already. - Writes performed by a transaction may still be in the buffer when the transaction commits and may be output later. ### Undo and Redo of Transactions - `undo(Ti)`: Restores the value of all data items updated by Ti to their old values, going backward from the last log record for Ti. - Each time a data item X is restored to its old value V, a special log record `<Ti, X, V>` is written out. - When undo of a transaction is complete, a log record `<Ti abort>` is written out. - `redo(Ti)`: Sets the value of all data items updated by Ti to the new values, going forward from the first log record for Ti. - No logging is done in this case. ## Recovering from Failure - When recovering after failure: - Transaction Ti **needs to be undone** if the log contains the record `<Ti start>`, but does not contain either the record `<Ti commit>` or `<Ti abort>`. - Transaction Ti **needs to be redone** if the log contains the records `<Ti start>` and contains the record `<Ti commit>` or `<Ti abort>`. - Suppose that transaction Ti was undone earlier and the `<Ti abort>` record was written to the log, and then a failure occurs. - On recovery from failed transaction Ti is redone. - Such a redo redoes all the original actions of transaction Ti including the steps that restored old values. - **Known as repeating history(redoing an undo.or anything after the commit again), seems wasteful, but simplifies recovery greatly.** ## Checkpoints - When a system crash occurs, we must consult the log to determine those transactions that need to be redone and those that need to be undone. In principle, we need to search the entire log to determine this information. There are two major difficulties with this approach: 1. The search process is time-consuming. 2. Most of the transactions that, according to our algorithm, need to be redone have already written their updates into the database. Although redoing them will cause no harm, it will nevertheless cause recovery to take longer. - To reduce these types of overhead, we introduce checkpoints. 1. Output all log records currently residing in main memory onto stable storage. 2. Output all modified buffer blocks to the disk. 3. Write a log record `<checkpoint L>` onto stable storage where L is a list of all transactions active at the time of checkpoint. 4. All updates are stopped while doing checkpointing. - During recovery, we need to consider only the most recent transaction Ti that started before the checkpoint and transactions that started after Ti. - Scan backward from end of the log to find the most recent `<checkpoint L>` record. - Only transactions that are in L or started after the checkpoint need to be redone or undone. - Transactions that committed or aborted before the checkpoint already have all their updates output to stable storage. - Some earlier part of the log may be needed for undo operations. - Continue scanning backward till a record `<Ti start>` is found for every transaction Ti in L. - Parts of the log prior to the earliest `<Ti start>` record above are not needed for recovery and can be erased whenever desired. - The requirement that transactions must not perform any updates to buffer blocks or to the log during checkpointing can be bothersome, since transaction processing has to halt while a checkpoint is in progress. A fuzzy checkpoint is a checkpoint where transactions are allowed to perform updates even while buffer blocks are being written out. Section 19.5.4 describes fuzzy-checkpointing schemes. ### Example of Checkpoints *Explanation about the example of checkpoints is missing in the provided text.* - Logging (during normal operation): - `<Ti start>` at transaction start. - `<Ti, Xj, V1, V2>` for each update, and. - `<Ti commit>` at transaction end. - Transaction rollback (during normal operation): - Let Ti be the transaction to be rolled back. - Scan log backward from the end, and for each log record of Ti of the form `<Ti, Xj, V1, V2>`: - Perform the undo by writing V1 to Xj. - Write a log record `<Ti, Xj, V1>`. - Such log records are called compensation log records. - Once the record `<Ti start>` is found, stop the scan and write the log record `<Ti abort>`. - **Redo phase**: 1. Find the last `<checkpoint L>` record, and set undo-list to L. 2. Scan forward from above `<checkpoint L>` record. 1. Whenever a record `<Ti, Xj, V1, V2>` or `<Ti, Xj, V2>` is found, redo it by writing V2 to Xj. 2. Whenever a log record `<Ti start>` is found, add Ti to undo-list. 3. Whenever a log record `<Ti commit>` or `<Ti abort>` is found, remove Ti from undo-list. - **Undo phase**: - Scan the log backward from the end. 1. Whenever a log record `<Ti, Xj, V1, V2>` is found where Ti is in the undo-list, perform the same actions as for transaction rollback: - Perform undo by writing V1 to Xj. - Write a log record `<Ti, Xj, V1>` (compensation log record). 2. Whenever a log record `<Ti start>` is found where Ti is in the undo-list: - Write a log record `<Ti abort>`. - Remove Ti from the undo-list. 3. Stop when the undo-list is empty (i.e., `<Ti start>` has been found for every transaction in the undo-list). - After the undo phase completes, normal transaction processing can commence. ### Example of Redo - Undo ![image](https://hackmd.io/_uploads/HJ71QPV8a.png) ## Redo Phase and Repeating History - During the redo phase, every log record since the most recent checkpoint is replayed. This means that all update actions executed after the checkpoint, including those of incomplete transactions and actions carried out to roll back failed transactions, are repeated. - The actions are repeated in the same order in which they were originally carried out. This process is referred to as "repeating history." While it may seem wasteful, repeating history, even for failed transactions, simplifies recovery schemes. ## Recovery Algorithm Optimization Database commit processing is critical for ensuring data consistency and durability. However, the traditional approach of forcing log records to disk for each transaction can lead to significant overhead. To address this, database systems employ optimization techniques, including group commit. ### Group Commit Technique - Group commit optimizes the process of committing transactions by delaying log writes until a group of transactions is ready to be committed. Instead of forcing the log as soon as an individual transaction completes, the system waits until several transactions have completed or a predefined time has passed. It then commits this group of transactions together, resulting in fewer log write operations per committed transaction. ## Optimizing Commit Processing ### Benefits of Group Commit 1. **Reduced Overhead**: Group commit significantly reduces the overhead caused by frequent log writes, improving the overall performance of the database system. 2. **Increased Transaction Rate**: The careful choice of group size and maximum waiting time ensures that blocks are efficiently utilized, allowing more transactions to be committed per second. The increase in transaction rate depends on the storage medium. - a. **Hard Disk**: Without group commit, hard disk-based logging allows a maximum of 100 to 200 transactions per second. With group commit, this can be increased to 1000 to 2000 transactions per second. - b. **Flash Storage**: Flash storage, with its faster write times, can support up to 10,000 transactions per second without group commit. However, with group commit, this rate can skyrocket to 100,000 transactions per second. Additionally, group commit minimizes the number of erase operations, optimizing flash storage performance. 3. **Reduced Commit Delay**: While group commit introduces a slight delay in committing transactions that perform updates, it effectively reduces the overall commit delay, especially when dealing with high rates of transaction commit. ## Programmer's Role in Optimization Database performance can be improved by programmers through certain strategies: - **Batch Processing**: When dealing with data loading applications, inserting each record as a separate transaction can be limiting. Instead, performing a batch of inserts as a single transaction allows log records to be written together on one page, increasing the number of inserts that can be performed per second. **Summary** Optimizing transaction commit processing is essential for improving the overall performance of a database system. The group commit technique efficiently reduces overhead and enhances transaction rates, particularly when dealing with high-commit-rate scenarios. Additionally, programmers can play a vital role in improving transaction commit performance by adopting batch processing strategies to maximize database throughput. ## Transaction Failures ### Dirty Reads ![image](https://hackmd.io/_uploads/Syr1ZH486.png) ### Incorrect summary ![image](https://hackmd.io/_uploads/BkYUWrVIa.png) ### Lost Update ![image](https://hackmd.io/_uploads/ry9vbHEIa.png) **T2 aborts as it wonders why the value is different** ### Unrepeatable Read ![image](https://hackmd.io/_uploads/r1fAbrEI6.png) ### Phantom Read ![image](https://hackmd.io/_uploads/BkryMBV8a.png)