碩一 - HackMD

# 碩一 ## [**hackmd語法**](https://hackmd.io/features?both) ## [MathJax semantic](https://hackmd.io/features?both#MathJax) ## Related Work * [IDS with feature selection using IGR](https://link.springer.com/chapter/10.1007/978-981-15-0199-9_62) * [An Improved Training Algorithm for Support Vector Machines](https://svms.org/training/OsFG97.pdf) * [Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines](https://www.microsoft.com/en-us/research/uploads/prod/1998/04/sequential-minimal-optimization.pdf) * [Support Vector Machine Solvers](https://www.csie.ntu.edu.tw/~cjlin/papers/bottou_lin.pdf) ## NSL-KDD [NSL-KDD in Chinese](https://mathpretty.com/10244.html) ## Data Preprocessing for NSL-KDD [Kaggle - Network Security Attack Classification](https://www.kaggle.com/code/daliahesham/network-security-attack-classification) ## libSVM sources [github](https://github.com/cjlin1/libsvm/blob/master/svm.cpp) ## About SVM ### [Practical Guide to SVC](http://www.datascienceassn.org/sites/default/files/Practical%20Guide%20to%20Support%20Vector%20Classification.pdf)(未看) [baidu](https://baike.baidu.com/item/%E6%94%AF%E6%8C%81%E5%90%91%E9%87%8F%E6%9C%BA/9683835) [support-vector-machines-succinctly](https://www.syncfusion.com/succinctly-free-ebooks/support-vector-machines-succinctly/multi-class-svms) [Lagrange duality](https://zhuanlan.zhihu.com/p/38182879) ## How to solve SVM QP problem solver - huge computational resources needed [Sequential Mimimal Optimization](https://www.microsoft.com/en-us/research/uploads/prod/1998/04/sequential-minimal-optimization.pdf) ## Genetic algorithm used in IDS * https://scholar.google.com.tw/scholar?hl=zh-TW&as_sdt=0%2C5&q=ids+genetic+algorithm&btnG=&oq=IDS+genetic * http://bit.csc.lsu.edu/~jianhua/krish-1.pdf ## 2022/11/16 * [pytorch data augmentation for fashion minst dataset](https://discuss.pytorch.org/t/data-augmentation-fashion-mnist-image/136762) * [pytorch data augmentation](https://medium.com/dejunhuang/learning-day-23-data-augmentation-in-pytorch-e375e19100c3) * [Pytorch fashion minst refer](https://pytorch.org/vision/stable/generated/torchvision.datasets.FashionMNIST.html) * [Pytorch data augmentation refer](https://pytorch.org/vision/stable/transforms.html) * SVM homework to-do * [callback function implementation](https://debuggercafe.com/using-learning-rate-scheduler-and-early-stopping-with-pytorch/) * [Pytorch ZeroToAll](https://github.com/hunkim/PyTorchZeroToAll) * [香港大學 pytorch class](https://drive.google.com/drive/folders/0B41Zbb4c8HVyUndGdGdJSXd5d3M?resourcekey=0-s90CYmIbmbqbO1Mvtwmlog) * Training forward+backward backword for weight update ## 2022/11/6 * SVM 解法1 : 二次規劃解 Lagrange Multiplier -> 資料集大量時使得Q矩陣次方增長，無法放入記憶體中，故實作中不可行 * SVM 解法2 : Sequential Mimimal Optimization - 任選兩組Lagrange Multiplier 進行最佳化，逐次進行直到所有 Lagrange Multiplier被最佳化，不須使用Q矩陣 * 使用QTS產生w與b的組合不可行，該方法需要每次迭代去計算每個訓練資料的來進行調整 ->需要巨量時間 * 問題點 * 如何定義粒子產生的超平面的最佳解與最差解 > 對每筆訓練資料(萬筆數據)去計算距離太花費時間 * 不知道如何更新機率矩陣的方向 * 不知道產生粒子的範圍 * QTS與SVM結合的方向 * 如何產生 support vector > 從訓練資料中隨機選擇 > 如何搜尋更新方向 * 可著重於QP solver的矩陣大小減少 > 選擇Lagrange Multiplier為零的行列刪除 > 屬於組合最佳化問題嗎? * 選擇 Lagrange Multiplier 的組合可能有Lagrange Multiplier為零表示不屬於support vector > 組合最佳化 ## 2022/11/5 * QTS implement to first & second heuristic in SMO * QTS to find support vectors * **?** heurisitic in grid search ## 2022/10/23 ### 高等計算機網路期中報告論文 > Enhanced Interval Trees for Dynamic IP Router-Tables #### 5 BOB For Nonintersectiong Ranges * Turn proposed enhanced interval tree into binary tree on binary tree (BOB) #### 4.2 Proposed Enhanced Interval Tree * the PTST of **[28]** with the following modifications/restrictions * Some nodes of the PTST may be empty * The total number of nodes in the PTST is at most $2n$, $n$ is the number of ranges stored in the PTST - **PTST size constraint** * the algorithm to insert a range ![](https://i.imgur.com/HZSJonw.png) * After insertion, the traditional red-black tree rebalance pass is made that may require color changes and at most one rotation * The tree structure may lead to a violation of the range allocation rule * After insertion, $|PTST| \leq 2|R|+1 < 2(|R|+1)$, where $|PTST|$ is the number of ranges stored after the insertion, which is not violated the **PTST size constraint** * Time required to insert a range is $O(height(PTST))=O(\log n)$ without a rebalancing rotation * Red Black Tree Rotation * ![](https://i.imgur.com/MRdI16r.png) * $S=\{r|r\in ranges(x)$ **and**$start(r) \leq point(y)\leq finish(r)\}$ * $ranges'(x)=ranges(x)-S$ and $ranges'(y) = ranges(y)\cup S$ * The time required doing LL or RR rotation deopends on the time taken to determine $S$, remove $S$ from $ranges(x)$, and add $S$ to $ranges(y)$ * The time for LR and RL rotations is **roughly twice** that for LL and RR rotations * Delete a Range ![](https://i.imgur.com/97rnqUi.png) ![](https://i.imgur.com/70ehEt9.png) ## 2022/10/22 ### 高等計算機網路期中報告論文 > Enhanced Interval Trees for Dynamic IP Router-Tables #### Enhanced Interval Tree Interval Trees * **[27]** Interval Tree * Each range $r$ in the left subtree of $z$ satisfies >[以下可使用線段圖及樹狀圖來表示] > $(start(r)<start(range(z))$ or $(start(r)=start(range(z))$ and $finsh(r)>finish(range(z)))$ * Each range $r$ in the right subtree of $z$ satisfies $(start(r)>start(range(z))$ or $(start(r)=start(range(z))$ and $finsh(r)<finish(range(z)))$ * prefix insertion/deletion take $O(log\ n)$ time * while longest matching-prefix with number of $k$ match the given address * it takes $O(min(n,k\log\ n))$ time to find the longest matching-prefix * **[28]** Interval Tree * a point, $point(z)$, associated with every node of the RBT * the point of the node $x$ on the left subtree rooted by $z$ satisfies $point(x)<point(z)$ * the point of the node $y$ on the right subtree rooted by $z$ satisfies $point(y)>point(z)$ * The point search tree (PTST) meet the condition above and below * $start(r)\leq point(z) \leq finish(r)$ stored in the $root$ node * $finsh(r) < point(z)$ stored in the left subtree * the remaining ranges stored in the right subtree * each node in PTST maintain two lists $left(z)$ and $right(z)$ * the longest matching-prefix can be found in $O(\log n+k)$ time * Insertion/Deletion are very expensive * Every node of the PTST in **[28]** must be nonempty * The high complexity of the update operation caused by the requirement that $ranges(z)$ be nonempty for every node $z$ * Rebalancing rotations ![](https://i.imgur.com/Vpyo1gt.png) #### Preliminaries ![Relationshops between pairs of ranges](https://i.imgur.com/1z7jXMG.png) * $r=[u,v]$ * $u$ is the start point of the range, and $v$ is the end point of the range * Example, $W=6$ with 1101\* prefix matches the address $\{110100, 110101, 110110, 110111\} = \{52,53,54,55\}$ * each range takes up to $2W-2$ prefixes to represent * Let R be a range set. $ranges(d,R)$ is the subset of ranges of $R$ that match the address $d$ * Example, When $R=\{[2,4],[1,6],[7,10]\},ranges(3)=\{[2,4],[1,6]\}$ ## 2022/10/21 ### 高等計算機網路期中報告論文 > Enhanced Interval Trees for Dynamic IP Router-Tables #### Related Work * Gupta et al. **[16]** proposed the DIR-24-8 scheme. * This scheme contains two table, **TBL24** and **TBLlong** * $2^{24}*16$ bit + $t*256$-entry blocks * The scheme accesses memory at most twice in a worst case. * [Reference](https://marco79423.net/articles/ip-lookup-%E6%BC%94%E7%AE%97%E6%B3%95-dir-24-8-basic) * The scheme Waldvogel et al.**[17]** proposed performs **a binary search** on **hash tables** sorted by **prefix length** * This scheme performs lookup in $O(logW)$ time * **[18]** developed an alternative takes $O(W\ + log\ n)$ timein lookup with a n-prefixes table * These methods above **aren't be suited in dynamic table**, since they use **expensive precomputation** * Dynamic TCAM * Shah and Gupta **[7]** takes **insert/delete** in $O(W)$ time * Basu and Narlika **[19]** proposed **a dynamic table design** for **pipelined forwarding engine** * Suri et al. **[20]** employs the B-tree for dynamic LMPTs * **longest matching-prefix lookup** performs in $O(log\ n)$ time * **Insert/Delete** takes $O(Wlog\ n)$ time * Cache misses are $O(log\ n)$ times * Sahni and Kim **[21],[22]** employ **a collection of red-black tree (CRBT)** and **an alternative one (ACRBT)** in dynamic LMPT * These schemes perform ops in $O(log\ n)$ time * The number of cache misses is also $O(log\ n)$ * ACRBT can be modified easily to **the biased-skip-list** * **[23]** obtains a biased skip list for dynamic LMPT * Lu and Sahni **[24]** provide a priority search tree to obtain $O(log\ n)$ time in routing-ops * **[25]** represented a binary trie in **highest-priority prefix-table(HPPT)** * $O(W)$ time in routing ops and cache misses * Gupta and McKeown **[26]** developed two DS, **heap on trie** and **binary search tree on trie**, for dynamic **highest-priority range-table(HPRT)** * **heap on trie(HOT)** takes $O(W)$ time on lookup and $O(Wlog\ n)$ time for insert/delete * **binary search tree on trie(BOT)** takes $O(Wlog\ n)$ time for a lookup and $O(W)$ time for an insert/delete ## 2022/10/20 ### 高等計算機網路期中報告論文 > Enhanced Interval Trees for Dynamic IP Router-Tables #### Intro * data structures for dynamic NHPRTs, HPPTs, and LMPTs * Developing the terminology in Section 3 * Review two interval tree and construct their proposed method in Section 4. * BOB in Section 5 * Ranges of different route table discussed in Section 6 * PBOB for HPPTs in Section 7 * LMPBOB for LMRTs in Section 8 * Experimental results in Section 9 #### Related Work * LMPT researches * Ruiz-Sanchez et al. review DS for static LMPTs **[4]** * Sahni et al. review DS for both dynamic and statics. **[5]** * TCAM isn't recommend **[7]** * simple and efficient solution but requiring special hardware **[8]** * EZchip Technologies claim that classifiers can forgo TCAMs in favor of commodity memory solutions **[2]** * special hardware require more power and board space * Trie-based DS for LMPTs **[9]** - **[15]** * **[9]** perfom dynamic router-table ops in $O(W)$ time * **[10]** - **[15]** optimize lookup time and memory usage, these research turned out fast lookup speed and poor speed at insert/delet time - only suitable for static router-tables ## 2022/10/17 ### 高等計算機網路期中報告論文 > Enhanced Interval Trees for Dynamic IP Router-Tables #### Abstract * interval tree DS for dynamic IP router tables - Binary tree in Binary tree (BOB) 有很多變體去因應不同的IP router tables * BOB for nonintersecting ranges rule filters & highest-prioity tie breaker Example : prefix filter with longest-prefix tie breaker * BOB with n-rule router table * match address found in $O(log^2\ n)$ time * updates and deletions in $O(log\ n)$ time * CBOB (Compact BOB) * PBOB (Prefix BOB) with all rule are prefixes in filters * $W$ is the longest prefix * match address found in $O(W)$ time * updates and deletions in $O(W)$ time * LMPBOB (longest Matching-Prefix BOB) with all rule are prefixes in filters & longest matching prefix is to be done * $W$ is the longest prefix * match address found in $O(W)$ time * updates and deletions in $O(log\ n)$ time * $O(log\ n)$ time in cache misses * BOB and PBOB perform each of the three dynamic-table operations in $O(log\ n)$ time and with $O(log\ n)$ time #### Intro * the repesentation of rule-table rule - $(F, A)$, $F$ is a filter and $A$ is an action * Action examples : dropping the packet, forwarding the packet, and reserving the bandwidth * Each filter with a range of $[ u,v ]$ is assumed, where destination address $d$ is satisified $u\leq d\leq v$ * A tie breaker is needed to select a rule from a set of matched rules since several rules may match a target address ## 2022/10/6 ### [Survey of intrusion detection systems: techniques, datasets and challenges](https://link.springer.com/content/pdf/10.1186/s42400-019-0038-7.pdf) ### 閱讀論文進度：看到 Performance metrics for IDS > Murray et al., has used GA to evolve simple rules for network traffic (Murray et al., 2014). **Every rule is represented by a genome and the primary population of genomes is a number of random rules.** Each genome is comprised of different genes which correspond to characteristics such as IP source, IP destination, port source, port destination and 1 protocol type (Hoque & Bikas, 2012). ### 編碼方式：一個染色組內含多個染色體，一個檢測規則(特徵)為一個染色體 ## 2022/10/3 > SVM is inefficient at large size problem, Sequential Minimal Optimization (SMO) to divde problem into several sub-problem ## 2022/9/30 ### 3篇論文未看 - [ ] [Intrusion Detection System Based on SVM for WLAN](https://www.sciencedirect.com/science/article/pii/S2212017312000679) - [ ] [Intrusion Detection System using Support Vector Machine and Decision Tree ](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.206.5560&rep=rep1&type=pdf) - [ ] [A novel SVM-kNN-PSO ensemble method for intrusion detection system ](https://www.sciencedirect.com/science/article/pii/S1568494615006328) ### 物聯網作業 50Startups營收預測 - [ ] [link](https://lms2020.nchu.edu.tw/course/homework/17508) ## 2022/9/26 ### Support Vector Machine Research #### 推導 ![](https://i.imgur.com/9n08VIG.png) 假設分類兩資料點x0, x1分別位於w*x-b=1, w\*x-b=-1上，求兩線的距離 x0-x1形成一向量，兩線垂直距離為該向量於w\*x-b=0之法向量上之投影，也就是(x0-x1)與w之標準化向量做點積(dot product)可得 = (x0-x1)．w/|w| = (w\*x0 - w\*x1)/|w| = 2/|w|, x0位於w\*x-b=1,x1位於w\*x-b=-1, 因此帶入後可得到(w\*x0 - w\*x1) = 1-(-1) [distance between margins derivation](https://math.stackexchange.com/questions/1305925/why-is-the-svm-margin-equal-to-frac2-mathbfw) ## 2022/9/12 ### 如何使用GNQTS實作分類器-參考機器學習的分類器 * SVM (Support vector machine) * decision tree ### 參考面向 * 編碼方式 * 適應值計算方式 * 輸出函數 [外國版類似論文-皆有提到SEKM,SEIDS,未提到SE詳細資料](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9587486) 8 :::spoiler 2022/9/7 resource arrangement 將解空間切成數塊子解空間在每個解空間撒點 vision search * Transition like crossover and mutation in genetic algorithm * ExpectedValue * Determination ::: :::spoiler 2022/7/27 ## 2022/7/27 > 需要改變產生訓練齊資料的方式，以Q#為例，如自2012/01-2022/12為實驗區間，則訓練期實驗區間為2011/01-2021/12 ### 方法一先計算總共多少月份，再以步數去產生多少組訓練期實驗區間 2011/01-2021/12為120個月分，若是Q#，則步數為三所以共有120/3=40組訓練期實驗區間 ### 怎麼產生各類型實驗區間 #### X2N - 此類型為目標區間往前推指定步數的月份作為訓練期實驗區間 #### M2N - 目標月份為假設2011/01-2020/12，訓練期實驗區間為2010/12-2020/11 使用一個月的資料進行訓練，也就是2010/12訓練2011/01 #### Q2N - 目標月份為假設2011/01-2020/12，訓練期實驗區間為2010/09-2020/11 使用三個月的資料進行訓練，也就是2010/10-12訓練2011/01 #### H2N - 目標月份為假設2011/01-2020/12，訓練期實驗區間為2010/06-2020/11 使用六個月的資料進行訓練，也就是2010/07-12訓練2011/01 #### Y2N - 目標月份為假設2011/01-2020/12，訓練期實驗區間為2010/01-2020/11 使用十二個月的資料進行訓練，也就是2010/01-12訓練2011/01 #### X# - 使用去年相同長度月份的資料進行訓練 #### M# - 使用去年同月的資料訓練，也就是2010/01訓練2011/01 #### Q# - 使用去年同月的資料訓練，也就是2010/01-03訓練2011/01-03 #### H# - 使用去年同月的資料訓練，也就是2010/01-05訓練2011/01-05 ### 步數 y=12, h=6, q=3, m=1 ### 取資料的辦法目前以記錄股票日期在csv檔中的index, 例如2019/02/01在csv檔中為第2192個位置以陣列存取 date[2192] = “2019/02/01” 從txt檔中讀取最新的日期為2020/12/31 > 目前存到vector中，可以改為array較快速嗎? ### 使用方法先決定M，Q，H，或是Y的月份長度從最新月份開始產生訓練期實驗區間 - 若為X#類型，假設目標結束月份為n，則訓練期結束月份為n-12 再根據M，Q，及H，目標起始月份為n, n-2, n-5 - 若為X2N類型，假設目標結束月份為n，則訓練期結束月份為n-1 再根據M，Q，H，及Y，目標起始月份為n, n-2, n-5, n-11 ::: ## 111/2/16 滑動視窗-股票區間擷取 - [x] 月份 - [ ] 季 - [ ] 半年 - [ ] 年 QTS+MA更改寫法與精簡 GNQTS+MA進行中 ## 111/2/14 寫滑動視窗-股票區間擷取進行實驗-看GNQTS方法，與QTS結合 ## 111/2/6 列印最佳解實驗，確認實驗結果正確列印成CSV檔時有錯，需要更改 ## 111/1/24 滑動視窗擷取測試期與訓練期區間 ## QTS結合MA ### 12/16 亂數與MA產生對齊，交易報酬的數字有些誤差 ## 股票擇時作業使用MA做股票擇時以及對答案:2603TW | 買一 | 買二 | 賣一 | 賣二 | link | | ---- | ---- | ---- | ---- | --- | | 5 | 20 | 5 | 20 | [link](https://drive.google.com/file/d/1fl2A999GbZdgPicUeDtYinMG4lbDZ846/view?usp=sharing) | | 5 | 20 | 5 | 60 | [link](https://drive.google.com/file/d/15zp3on0XCwb1VwLVISMPZwSmfrRe4JtQ/view?usp=sharing) | | 5 | 60 | 5 | 60 | [link](https://drive.google.com/file/d/1AM29fzdGM-WrKYxRVwk8mhKEn6eByz_s/view?usp=sharing) | | 6 | 18 | 6 | 18 | [link](https://drive.google.com/file/d/1xkfdP2PK6gfap6mzxhHEhoeoS0vrt-2d/view?usp=sharing) | | 6 | 50 | 6 | 50 | [link](https://drive.google.com/file/d/1kq1XjFUSlUm4jdEUvae3TuYQsISTgeBh/view?usp=sharing) | | 18 | 50 | 18 | 50 | [link](https://drive.google.com/file/d/10t9wZq5_6b0STPgWMcOfwF9Z3abjn3sT/view?usp=sharing) | | 20 | 60 | 20 | 60 | [link](https://drive.google.com/file/d/1itJJbjSy86k4jAkUJ6aYfLrPL0y6uwb6/view?usp=sharing) | ## 研究方向股票擇時 ## 2021/10/28 Quatumn Gate Tensor Practice ![](https://i.imgur.com/VxP7lFt.jpg)