Database Management System

# Database Management System ###### tags: `UCR` [TOC] ## SQL * Subqueries have to be computed for each attribute * Difficult to optimize nest queries * Aggreation Operations * Significant extension of relational algebra * Input and output is one relations ![](https://i.imgur.com/vQOtz2c.png) * Division Operations ![](https://i.imgur.com/WQ5QayG.png) * Group by * Prduce unique groups, no duplicate for each group (relation) * HAVE, group qualification operation * Hard to evaluate the correlation SQL ## Indexing ### Hash index(From answer) * constructed by using a hashing function that quickly maps an search key value to a specific location in an array-like list of elements called **buckets** * Hash function is chosen to avoid collision * If collision occur, a link is formed to connect multiple record * In the case that too many of these collisions occur, the number of buckets is increased * May underfill or overfill * Bad for range query becuz its not sorted ### ISAM * Static * Will not change after insertion or deletions, cause overflown pages * May become linear instead of logarithmic (just append after the previous level) * occupy consecutive pages, each node holds 2 records (Not necessary 2) * Leafs: Data * Less than goes Left, otherwise go right * Bottom up * Space utilization of ISAM index pages is usually close to 100 percent by design ### B+ Tree * Is always balanced, no overflown pages * Log base 2 may not be good enough, multi-weight tree * No index records, guarantee 50% occupancy * Otherwise have to do something in each node * Pages may not be in sequential space * Reason: Insertion and deletions do not create more pages * The more fanout, the more record to have with few leyers * Keep top level in the MEM, then did not need to access disk * Insertion * If its full in the node, split the node * Values are index, not records * When the root is full and we're going to insert pointer * Split up to the root, may add another layer (the record number of root m can not be restricted by d<m<2d, that is can be less but not overflow) * May have to reorder after insertion to maintain sequential order * Deletion * If the leaf is at least half full, done * Except Root * Otherwise redistribute and merge with the sibiling * Borrow from the sibling if it has many values * Bulk loading : sort the records before insert, and always into the leftmost leaf(if the pages is empty, otherwise put in rightmost, and be a split path). ### Trick on disk (general) combine with B+tree * indexing: decreas the IO times * buffering: top of the tree in the buffer * Sequential instead of randomized access in the disk ### Clustering * Clustered data: Index order follows the order of data entries * Alternative one: root to leaf + leaf number * Unclustered data * Contain no data, just pointers * May have multiple pointers * Alternative 2: I/O: number of page plus number of data entries. so: ![](https://i.imgur.com/WzMAvB8.png) * data entries per data record (or more, save ids) * Alternative one: data with key no pointer. * Alternative two: k and one pointer. * for example: * The primary key is name, however, we want to use SSN to find the employees' id, (not cluster), go to catelog. * Alternative three: k and a list of pointer. * pass: read or write back to disk ## Sorting * Reduce Duplicates * External merge sort: * 2N*(number of passes), N is number of pages * general: log(N)+1 pass, log with ceiling * with buffer: $\log_{B-1}(\frac{N}{B})+1$, log and N/B are ceiling * Run 0, is as Buffer size! not B-1 * Alternative 2: number of page plus number of data entries. ### Method * 2-way extrenel merge sort * Sort 2 file and merge, divide and conquer * 3 byffers in the memory * Sorting with B+ Tree (with index) * Clustered: Use the order from index * Unclustered: Just go the file directly ## Evaluation of Relational Operators ### Join * Metrics: Number of I/O * Double for loop * Many IOs * For each outer relation R, scan the entire inner relation S * M pages in R. N pages in S * Cost = M+Pr(Number of tuple per page, total number of record in the page)\*M\*N * For page-oriented, M+M\*N **Note**: the reason of add M or N is because the need of accessing pages' IO * Block nested join * Have a hash table storing R * Cost: Scan of outer+ (scan of inner)\*(Number of outer blocks) * In MEN to provide buffer * Cluster or not effect a lot * Index nested join * Given a field in R, find from S such that the fields are ths same using index * Clustered index has better performance * Sort-Merge Join * Sort 2 relations and go down only once * MlogM+NlogN+(M+N), log is for sort * If there are duplicates, may have to go back for another relation * The cost of scanning may be as large as MN instead of M+N * Refinement * If memory size is big enough, can finish it in multiple pass * read + write in pass 0, read each relation in merging pass * Linear time for sort in terms of number of I/O ** The important assumption**: L=max(M,N), $B>L^{1/2}$, $B^{2}>L$, $B^{2}>M$ So $B>\frac{M}{B}$ (fit in one block) * ![](https://i.imgur.com/tjRllFO.png) * ![](https://i.imgur.com/zZGJbBt.png) * Join Hash * 1 page as input and B-1 as hash's page. * When one of bukect (B-1) is full, read to the disk partition. * fist phase cost: 2M,2N, because read and write, S, R both B-1 partition. * Hashing fuction with the same ID, so R1 will not in the same bucket with S2 for example. * Join with records. * second phase will be M+N for block loop join if just one block. * $\frac{M}{B-1}$ partition "size", B-1 in the MEN at the first phase. * $\frac{M}{B-1}<B-2$, that is the "max" partition size smaller than the page number leave for block, i.e. $M<B^{2}$ less than blocks, one pass for the partition. * As a result, cost $3(M+N)$ **but why need other algo such as sort merge join**: if hash not uniform!? ## Query Optimize and evaluation: * Evaluation: * approximate input and output * Physical plan: * Have actual implement of operation, reversely is logical plan * 1-1 to logical plan * Logical plan * can be map to more than one physical plan ### Optimization * Static * Select best plan at the first. * Dynamic * Slect operator plan on-the-fly, execute first. * Hybrid * Compile with static algorithm * But when errors in estimation > threshold, reoptimize * Prepare Statement * Reoptimize for every rerun * Generate multiple plans for different values ### Stability * Hints * Fixed Optimizer Version * Backward-compatible plans ### Types * Heuristics * Based on relational algebra * Steps * Decompose into single variables * Substitude the values * Advantage: Easy to Implement * Disadvantage: Difficult to generate good plan when the query is complex * Heuristic + Cost-based join * Advantage: Find a reasonable plan without having to perform an exhaustic search * Disadvantage: Only look at left deep join, may not be optimal * Can pipline the join ### Optimizor Generator * Stratified Search * Planning is done in multiple way * Similar to system R * Rewrite logical queries to trandformations * Unified search * combine both logical to physical and logical to logical. * Have more transformation to consider * Can use memorizatin idea, like Dynamic Programming ### Top down: find the cause to help, save us going to the wrong way. * Volcano: Start with a logical plan we want our query be * Drawback: There are so many paths to consider and hard to modify * Cascade: Instead of instantiate all possible expression, represent redundant expression as a group :::info A join B is the same as B join A, represent it as \[AB\], ignore the implementation ::: * Use a memo table: Store all previously computed path * Every subpathof an optimal path is itself optimal * Example: get(a) -> (Physical)Forward Scan(A) or Index scan(A) * In memo table, store the smallest cost * Why still look at logical? * Find the one with the least operator ### Search termination * Wall-clock time * Stop after the optimizor runs a specific amount of time * Cost threshold * Stop after the optimizor found a path with a cost less than a given threshold * Transformation Exhausion * Per group basis * Bottom up * Selection before join, less intermediate data * Aggregate: bottom up approach * Selinger Optimizer: Can use dynamic programming to **memorize, bottom up**, but may not in order, so need to plan an order in it. * OLAP(Online Analytical Processing): have many relations to join * Logical: * join * Physical: * what kind of join, with implement and algo * Remenber Pattern of previous created logical pattern * The sub plan of the optimal plan is optimal **Read Cascades thesis/System R** ## Spartial Data * Defines as data which cover geometric objects(points, lines, etc.) * Occupies certain amount of space called spartial extends ### R-Tree * Representation * Each entry leaf node represented by (n-dim rect(I), tuple identifier) * I is the smallest rect * Non-leaf: (Index, child pointer) * For high dimension (Given bounded rectangle) * Minimum bounding rectangle (MBR) * For index, the coordinates of 2 corners (upper right, down left, similar as CV) * `(x1,x2,y1,y2)` -> ptr to actual data * point to the intersect retangle and recursively go down * Features * Balanced similar to B-trees * At least half full * Fine to have overlaps roots * Inserts * Done when its leaf * At N, find the entry F with the least the enlargement of a Fi * Deccend until the leaf is reach * Parent may be updated recursively * If overlap choose the entry with small "enlargement" area * If full, split node * Deletion (4/30), Condense tree * Main difference with B+ tree, if the number of nodes in leaf is less than threshold: * ![](https://i.imgur.com/p5LGj7G.png) * Node split: * Why split the less coverage? because it is better for search, less miss! * Minimize the covering area of newly create node(entry) * Pick seed(for node split): * Qudratic cost algo(pick one and look remain), Pick next(pick the max diference distance) * Seems like an clustering, larger dist between two grounp is better * Linear Cost algo(compare to Qudratic cost algo, the Picknext is dfference, here is pickseed): * Highest low side(highest part of low side, letfdown point), lowest high side(lowest part of high side, upright point), on specific axis respectly, for example, may be x or y * Then normalize it * Overlap of directory entry will affect performance. * R+tree: * improve from r-tree and find less overlap(less page overlap on that area * Spend time on ordering * Insert and split is different to R tree, reinset method. ## Paper Reading ### Access path selection (system R) * RSS, maintain the physical system relation. * 4 steps, like a compiler * Parsing, **Optimization**, code gen, execution * Type of scans * Segment scan: Find all tuples for a given relation, go through all pages * All non-empty pages will be touch * Index scan: b+-tree * May search from a range * Where cluauses as predicates * Every conjuntive is called a boolean factor * Optimizor retrieves statistics on the relations in query and on the access path available for each relations * Choosing optimal path * Cheapest cost for single relation is obtained by evaluating the cost of each path * Group by and order by * A tuple orde is an interesting order * Examine only the cheapest path which the tuple in interesting order * Two way join method: * Nested loop join * No order requirement * Mergin Scan joins * require in join column order * Only applied to equal join * If no index on join column, put it into the temporary list * More complex, the order can let Merge Scan do more better, remember the location. * Clustering * N-way join: visualize as a sequence of 2-way join * first 2-way join, and so on..... * Mix Nested loop join, Mergin Scan joins * Join order will affect * Thus, join first K, fix it, then K+1 will be the independent of the first k * Reduce the Catesian product if the join are different between some relations * Using tree to find optimal, saved the cost. * Equvalent class (remember it) * Max store number of subsets of tables multiply the interesting result order (Using groupby, order by it is interesting order, oherwise it is unorder path, if there has, must be compare with unorder path plus the cost of sorting into proper order) * Actually to maybe same cost by formula, but merge join can be better because of clustering. * CPU bounded * Compile once and run many times ### Cascades * Use Last-In-First-Out stack to schedule the task ( component of the algorithem of Cascades optimizer) * Trigger by the task to optimize top group, and go to subgrounp and so on (top down fashsion) * Dynamic programming is used, checks whether the same optimization goal has been pursued already; if so, it simply returns the plan found in the earlier search. * It explores groups only for truly useful patterns, not exhaustive * Expression pattern is logical, subsitute can be logic or physical (Predicates as operators that is both logical and physical) * Use static hashing to detect duplicate expression * transformation rule: substitute is logic * implementation rule: substitute is physical #### Versus Volcano * Create all possible logical expressions for a query and all its subtrees exhaustively * Top down ## Spartial Indexing * Using SAM * Index both points and regions * Minimum MBR ### Grid File * Use grid to parition the space * Each represent a page * May be larger than memory page * Z-ordering * Impose linear ordering of all points * Hubert ## Spartial join * R-tree * Objects are clustered, recursive cluster * join * Seeded tree * Copy the top level of R-tree * Have the same cluster of index * Building * Copy the top-k level of R tree, place slots at bottom level * Insert objects into slots * Remove empty slots * If only one set have index, built ![](https://i.imgur.com/L8Kme33.png) * Multiple Matching Join * Assign an object to a single bucket * May have to enlarge * Join with another bucket * Objects are duplicated * create the index on the fly for non-index relation, or depend on partition (spatial hash join) * partition into smaller one, if the MEM szie is fine then fit in, otherwise not. * join intersect one, one by one * Two bucket, two approach, multi matching, multi partition (5/12) * Size seperation join, * multi-level grid, small cell on lower level, so small size object on lower level.Can use mulit match. * one cell have to join with other cell from multi level * Flowing from the top to appropriate level * Have to join at every level ## Nearest Neighbor Search * if no index, it needs to search all * R-Tree * Start from root, create a queue(prioirty queue with distance) and find distance, until hit the leaf node(first data) * Pruning * MINDIST: Min dist of all axis * MINMAXDIST * Downward prune * Upward prune * ![](https://i.imgur.com/0jlhQxq.png) o is object, for every object in R * Best first * ![](https://i.imgur.com/tgzMGoT.png) o "exist", smallest upper bound (the farest dist in the closet face.) * Depth first, will backtrack, when back to root, terminate by if got the point. * Also has two NN (1:10, 5/12) * KNN search, check every time * Global order: Use only MINDIST ## The Skyline Problem * Choosing the points that dominates in an axis ### Top-1 search * Let P be a set of d-dims points, given a monotonic increaseing function, point P which has the smallest score * Difficult to decide a distance function * The skyline try to find a set of general solutions for all distance functions ### Skyline(More dimension) * Paper. R-tree, sorting (no-index) * Retrieve the point not donimated by any other * Like KNN near the 0 most * Find P which does not contain points dominated by other points (in all dimension are all equal and smaller, in one dimension is strickly smaller) * Skyline, in this line no point dominate other * Brand and bound Skyline(BBS) * Can prune MBR if there's already a point dominate that MBR (prune other not no skyline) * If indexed by R-tree, use a min-heap to stores the MBR * put point of MBR in SKY, and prune by skyline property * SFS: Another O(n log n) algorithm * In MEN, and 2d, not scan quaratic * Each page 2 record. * Sort the list based on a monotonic increase function * Some points may dominate P * Run and until overflow empty, output the memery ## Top-K * Each atributes has weights, find top-K * Naïve approach * Assign score, and order the record * List-based Solutions * For each entry in the database, assign id * Like the index * Naïve * Go through the database, compute F(x) for all x * Faign's * Sort in parallel * Stop when there are at least k elements in all list * Go below the list does not help * Cannot assume top-k of every list enough * Calculate the values for all objects seen so far. * May have to query for the score of an element not seen in list but in other lists * Output Top-K * Threhold Method * Compute the grade g of the last seen object, get the max of score for objects seen so far for threshold (g is the latest , smallest in the list) * The threshold will increase * Order the object in the list based on the score * Compute tau, the sum of the score the-N row (by row) * Will decrease * Until g and tau meet * <=(?, check the paper) **Note: g will increase, tau will decrease** * In the lecture, the unsorted/inaccessible list is always assume one * If there are some lists which are not sorted, but the score accesible from index * Consider only the objects in sorted list * B is the add smallest possible value in each line * Still add the score from the unsorted list for g * If random access is not allowed * Compute lower bound, replace unknown attribute with 0 * Compute upper bound, replace unknown attribute with least value senn ## Semi-structured data: * Can have extra data, and can have missing attribute. * Can have hierarchical. Such as json * Can search same structure (tree structure, parse by tree structure of the document) * From the paper can break into smaller one. Thus, from simple query to complicated one. * Can use list of parent and children (for a specific one set, ex: XML set and title set to find is it the parent of anthor one) to sort * Instead of searching all, do join: * Use the leaf node's end pos - start pos is 1 ![](https://i.imgur.com/vGXuO9S.png) * Can identify the descendant and ascendant * Different tag by different relation * Ex: book and title sort merge join in lecturer, it assume in the same doc. * (2 loops) Loop over the parents and childs (ascendants and descendants),find the interval of descendants for an ascendant. * Can by the result of descendant * Worst case: backtrack a lot. * In the stack are same part of tree ## Holistic Twig join (tree structure join) * Problem statement * Given a structure, find if it's in XML * decomposition leads to too many intermediate results * Approach: Decompose for the path * *e.g.* `author - fn - jane` * Maintain the connection for the objects in stream * Connections of objs in stacks * Represent the path in a compact form * Find the minimum start and put into the stack * Create a link to the previous item pushed into the stack * O(Input size + Output size) * No Twig Stack in the final ## Temporal database * A system manege time varing data * Join between intervals * Terms * Transaction time: When a fact is written to the database (logical operator), Bitemporal database(at each timestamp is valid time DB, look at the timestamp separately) * Can also refer to the future * Mutable, can change previous record. * Valid time: When a fact is effective * Deletion in Valid time is physical: do not know previous state. * Can use R-tree(but the upper level index may be too large * A number of change is a good evaluation for storing data * Adding will at the lastest timestamp, can not change past * Pure time slice: * In MEM can be O(a+log2(n)), log2(n) is binary search, space cost O(n), a is answer size, disk is O(a/B+logB(n)) * Log approach, scan log, cost O(n) * Copy approach, cost(a+logn), space cost O($n^{2}$) ( worst case: n(n-1)/2, add one item each time.) * Access forest: * The decsendant is included in the interval of parents * An array record the last object, [ $time$, $object$ $id$] * Search for a node, i.e at time t, which object is alive * go the array to searh time t, O(log(n)) * start from that node in the tree, and beyond the node is alive ( the dead one is its child and right sibling is the one come from future) * backtrack from the link list to other tree to see the nodes are alive, if meet not alive, will not go down * Answer is on the path * Checking the tree cose O(a) * Added until full, if full, use new page, also has hashing scheme * In the disk to manage the interval * u as a threshold for alive record, determine it is page useful(will be search or not in the future). If lower than u, put in another acceptor(copy procedure, need to update hashing scheme at the same time), searching cost O(a/B+logB(n)), * Searching alive records in the page can be answered but **No order** * Using version id to be more efficient ## MVBT , temporal index * multi-version B-tree * It is like acyclic graph structure * Week version condition (similar to previous threshold) * leaf: <key,in_version,del_version,info> * Inner node of the tree: <router,in_version,del_version,info>, router is interval * Structure modification if (1)delete record in the page lower than d (2)insert the nearly overflow page * Week version alvie at least $\epsilon d+1$ (or d), physically in the page even the record die * Strong version alvie at least $(\epsilon +1)d$ to $(k- \epsilon)d$ * underflow: merge sbling * overflow: split * some relax to let the let the page do not need to fix immediately, $\epsilon d$, split by version (key and version to create the interval) ## Grid file (5/5) 30mins talk about relevant papers * 2 I/O (go to disk to find the cell in the directory, second is to find the cell point to the page) * Z ordering will jump -> Hilbert curve is better in this (From high dimension to lower space) * Seeded tree: join with non index * cluster from the root * Hash join: * no index, no R-tree #Spatial join paper: * Both have index: 1. ![](https://i.imgur.com/8tZCy78.png) both have index, and use R-tree to join. * Just one has index: 2. ![](https://i.imgur.com/cMWSchB.png) can create the index on the fly but copy the upper level index,(only on the fly, it is not exact index, but it is efficient and do not need to go down, also if there is a point(unindexed set) not match the upper indexMBR in the R-tree, then it will not match any MBR in the R-tree) * it is not real R-tree, just for efficiently * Seed phase: copy from the top level index * Grown phase: from previous * Clean phase: If no anything match in the seeded part, remove the slot * RTJ(the appoarch of building anther R-tree) may even worse than(On teh IO part) the BFJ(Brute force join, using windows to do range query), of course the construction cost of RTJ is much more * BFS cost more match cost * Both not has index: 3. ![](https://i.imgur.com/P30hLRq.png) use partition * Hashing in the buckets, like hash join, generate the partition, start with grid. * first apporach: one X vs one Y, drawback: for duplicate data, more cost of space * second apporach, multiple matching: one X vs multi Y, drawback: it will enlarge the space, extend the size of bucket * Multiple assign: if in the same cell (intersect), join all * Size of set is different to join * Need to join with all level(one Y vs X) , from top to down, and can skip at first if not much at the top 4 ![](https://i.imgur.com/6qpIH3N.png) ## NN paper 1. ![](https://i.imgur.com/p7RS528.png): the idea is at each one point(best MBR) * using list(queue) and the rule 1 ,2 (Downward), 3 upward to prune the tree 2. ![](https://i.imgur.com/kARVcxf.png) : (best first)only use Mindst, * it is better than 1, but if the R-tree is large, only pruning by MINDIST and just one list, the MEN space is easily be fulled * If the one list(put all entries in the list/priority queue) can stay in MEN not full, then all is better