DB 2021/09/01 - HackMD

# DB 2021/09/01 ## Environmnet ## Database Setting - Data Migration: Hermes - Number of Server: 1 - Number of RTE: 1 - Buffer Size: 524288 (the smallest size that can put the whole database into buffer) - Java Heap Size: - sequencer: 16GB - server: 16GB - client: 8GB - Only YoungGC appears according to the GC log. - Thread Pool: 50 - COMM_BATCH_SIZE: 1 - SCHEDULE_BATCH_SIZE: 1 ## Workload Setting ### Common settings - Workload Type: YCSB-Simple(CHoose records followed Zipfian) - Number of Record(INIT_RECORD_PER_PART): 10000 - ZIPFIAN_PARAMETER: 0.99 - DIST_TX_RATE: 0 ### Vary from differnt workload setting - RW_TX_RATE: 0, 0.5, 1 - TX_RECORD_COUNT: 2, 10, 100 For simplification, Rec-2_RW-0 = RW_TX_RATE: 0, TX_RECORD_COUNT: 2 ## Models Similar to MB2 by Pavlo, with Sklearn package 1. Random Forest Regression(RFR) 2. Kernel Ridge Regression(KRR) 3. Huber Regression(HR) 4. Support Vector Regression(SVR) ## Metric In MB2, the relative error is descibed as $$ Relative \ Error = \frac{|Actual - Predict|}{Actual} $$ However, the paper doesn't give a detailed definition of the relative error. According to the paper, they implement the model with Sklearn package. Thus, we use **Mean Absolute Percentage Error(MAPE)** as the metric to evaluate the models, which can be seen as a detailed definition of relative error. $$ MAPE(y, \hat{y}) = \frac{1}{n_{samples}} \sum_{i=0}^{n_{samples} - 1} \frac{|y_i - \hat{y}_i|}{\max(\epsilon, |y_i|)} $$ where $y_i$ is the label of i-th sample and $\hat{y}_i$ is the predicted value corresponding to the true value. $\epsilon$ is a small alter divisor to avoid `divided by zero error` while the $|y_i|$ is 0 ## Data Preprocessing ### Step 0. Features & Label Follow the spec of [this](https://hackmd.io/@pywang/r1UFn0oCd#) Label: We train a model for each OU latency - Generate Execution Plan Latency - Execute SP Arithmetic Logic Latency - Write to Local Storage Latency - Transaction commit Latency Features: - the number of reads in the read set - the number of writes in the write set - the number of active threads - thread pool size - current CPU utilization - number of cache read - number of cache insert - number of cache update - number of arithmetic operations - number of write back record - number of bytes of write back record - number of read write record - number of log flush bytes ### Step 1. Drop Ouliners Since the variance is very huge, we **drop** the data points **beyond mean +/- 1 standard deviation** ### Step 2. Sampling Sample 10000 Txns from dataset and 80% for training, 20% for testing, since it takes about 1 night to train on the whole dataset. #### Latency Histogram of Rec-2_RW-0 1. Execute SP Arithmetic Logic OU **Original** ![](https://i.imgur.com/2yzDHUL.png) **After Dropping Outliners** ![](https://i.imgur.com/A6tEedB.png) **After Sampling** ![](https://i.imgur.com/6biBvFe.png) 2. Write to Local Storage OU **Original** ![](https://i.imgur.com/WuKJ8NK.png) **After Dropping Outliners** ![](https://i.imgur.com/FSyDeX4.png) **After Sampling** ![](https://i.imgur.com/Ucyj359.png) 3. Transaction Commits OU **Original** ![](https://i.imgur.com/0Ll29QL.png) **After Dropping Outliners** ![](https://i.imgur.com/igBxRJ4.png) **After Sampling** ![](https://i.imgur.com/juWFirQ.png) 4. Generate Execution Plan OU **Original** ![](https://i.imgur.com/2NkoryE.png) **After Dropping Outliners** ![](https://i.imgur.com/I805qiD.png) **After Sampling** ![](https://i.imgur.com/zriC1Uq.png) #### Latency Histogram of Rec-10_RW-0.5 1. Execute SP Arithmetic Logic OU **Original** ![](https://i.imgur.com/5ZaAif7.png) **After Dropping Outliners** ![](https://i.imgur.com/0ofNM12.png) **After Sampling** ![](https://i.imgur.com/Ayak49t.png) 2. Write to Local Storage OU **Original** ![](https://i.imgur.com/mQKywAj.png) ![](https://i.imgur.com/d20ju2G.png) **After Dropping Outliners** ![](https://i.imgur.com/2m16n8o.png) ![](https://i.imgur.com/R3IQjRK.png) **After Sampling** ![](https://i.imgur.com/8IKBqId.png) ![](https://i.imgur.com/NBaEiSg.png) 3. Transaction Commits OU **Original** ![](https://i.imgur.com/05s30T4.png) ![](https://i.imgur.com/OfueTbI.png) **After Dropping Outliners** ![](https://i.imgur.com/7VirZnN.png) ![](https://i.imgur.com/gO2yWv1.png) **After Sampling** ![](https://i.imgur.com/xYCn1be.png) ![](https://i.imgur.com/cHbLtkF.png) 4. Generate Execution Plan OU **Original** ![](https://i.imgur.com/XIOCow5.png) **After Dropping Outliners** ![](https://i.imgur.com/MSLJlb8.png) **After Sampling** ![](https://i.imgur.com/6woaeFt.png) # Results We only show the best model with lowest relative error(or say MAPE) on testing dataset. | OU \ Workload | Rec-2_RW-0 | Rec-2_RW-1 | Rec-2_RW-0.5 | |:--------------------------- |:----------- |:----------- |:------------ | | Execute SP Arithmetic Logic | RFR: 0.1109 | HR: 0.1399 | RFR: 0.1791 | | Write to Local Storage | RFR: 0.1452 | RFR: 0.0604 | RFR: 0.1107 | | Transaction Commits | RFR: 0.1068 | RFR: 0.1717 | RFR: 0.1340 | | Generate Execution Plan | RFR: 0.3497 | KRR: 0.3555 | RFR: 0.2677 | | OU \ Workload | Rec-10_RW-0 | Rec-10_RW-1 | Rec-10_RW-0.5 | |:--------------------------- |:----------- |:----------- |:------------- | | Execute SP Arithmetic Logic | SVR: 0.0551 | RFR: 0.0842 | RFR: 0.0908 | | Write to Local Storage | RFR: 0.0772 | RFR: 0.0398 | RFR: 0.0873 | | Transaction Commits | SVR: 0.0613 | RFR: 0.0876 | RFR: 0.1073 | | Generate Execution Plan | RFR: 0.1668 | KRR: 0.1522 | RFR: 0.1234 | | OU \ Workload | Rec-100_RW-0 | Rec-100_RW-1 | Rec-100_RW-0.5 | |:--------------------------- |:------------ |:------------ |:-------------- | | Execute SP Arithmetic Logic | RFR: 0.0786 | RFR: 0.0692 | RFR: 0.064 | | Write to Local Storage | RFR: 0.0691 | RFR: 0.0276 | HR: 0.0413 | | Transaction Commits | RFR: 0.07 | RFR: 0.0588 | RFR: 0.0605 | | Generate Execution Plan | RFR: 0.1704 | RFR: 0.0518 | RFR: 0.0484 | # Conclusion 1. The more record, the lower error 2. The more read-only Txns, the higher error for Generate Plan OU 3. All in all, except for Generate Plan OU, other OUs perform well. # Appendix ## A1. MB2 results ![](https://i.imgur.com/iGZjm6k.png) ## A2. Box Plot ### The definition of the fliers/outliner in `matplotlib.pyplot.boxplot` ![](https://i.imgur.com/l3X7orq.png) ### Latency Histogram of Rec-2_RW-0 #### 1. Execute SP Arithmetic Logic OU ![](https://i.imgur.com/zCmmEv8.png) ![](https://i.imgur.com/4U7WE2q.png) ![](https://i.imgur.com/xhPoCmt.png) ![](https://i.imgur.com/0yUPyuZ.png) ![](https://i.imgur.com/FmKKocd.png) #### 2. Write to Local Storage OU ![](https://i.imgur.com/ilpdU0S.png) ![](https://i.imgur.com/JmmNAzv.png) ![](https://i.imgur.com/eZDRWKV.png) #### 3. Transaction Commits OU ![](https://i.imgur.com/o71gP0U.png) ![](https://i.imgur.com/chIib8X.png) ![](https://i.imgur.com/Ru0TuYr.png) #### 4. Generate Execution Plan OU ![](https://i.imgur.com/aAgUIuP.png) ![](https://i.imgur.com/cTm2LQn.png) ![](https://i.imgur.com/j7eA2tH.png) ### Latency Histogram of Rec-10_RW-0.5 #### 1. Execute SP Arithmetic Logic OU ![](https://i.imgur.com/1gEgID8.png) ![](https://i.imgur.com/P9TcYPf.png) ![](https://i.imgur.com/dKDg8uw.png) ![](https://i.imgur.com/GYbEei9.png) ![](https://i.imgur.com/uXXydT4.png) #### 2. Write to Local Storage OU ![](https://i.imgur.com/EyjLr4z.png) ![](https://i.imgur.com/To6RNKo.png) ![](https://i.imgur.com/5hzoav9.png) #### 3. Transaction Commits OU ![](https://i.imgur.com/ba6O1Kv.png) ![](https://i.imgur.com/9YHPDNc.png) ![](https://i.imgur.com/35BDvXG.png) #### 4. Generate Execution Plan OU ![](https://i.imgur.com/W9Amfjw.png) ![](https://i.imgur.com/71xb6AQ.png) ![](https://i.imgur.com/Aaoqd0b.png) # Note SVR on Rec-2_RW-0.5 has extremely high error: 36.3408/31.7461 SVR on Rec-10_RW-0.5 Flush OU has extremely high error: 補outliner的圖， show 不同 dataset size 的 accuracy 1. Show 不同OU在不同dataset size的 Mean和Std 2. 改OU名字 3. 確認有沒有GC

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.