# 碩一
## [**hackmd語法**](https://hackmd.io/features?both)
## [MathJax semantic](https://hackmd.io/features?both#MathJax)
## Related Work
* [IDS with feature selection using IGR](https://link.springer.com/chapter/10.1007/978-981-15-0199-9_62)
* [An Improved Training Algorithm for Support Vector Machines](https://svms.org/training/OsFG97.pdf)
* [Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines](https://www.microsoft.com/en-us/research/uploads/prod/1998/04/sequential-minimal-optimization.pdf)
* [Support Vector Machine Solvers](https://www.csie.ntu.edu.tw/~cjlin/papers/bottou_lin.pdf)
## NSL-KDD
[NSL-KDD in Chinese](https://mathpretty.com/10244.html)
## Data Preprocessing for NSL-KDD
[Kaggle - Network Security Attack Classification](https://www.kaggle.com/code/daliahesham/network-security-attack-classification)
## libSVM sources
[github](https://github.com/cjlin1/libsvm/blob/master/svm.cpp)
## About SVM
### [Practical Guide to SVC](http://www.datascienceassn.org/sites/default/files/Practical%20Guide%20to%20Support%20Vector%20Classification.pdf)(未看)
[baidu](https://baike.baidu.com/item/%E6%94%AF%E6%8C%81%E5%90%91%E9%87%8F%E6%9C%BA/9683835)
[support-vector-machines-succinctly](https://www.syncfusion.com/succinctly-free-ebooks/support-vector-machines-succinctly/multi-class-svms)
[Lagrange duality](https://zhuanlan.zhihu.com/p/38182879)
## How to solve SVM
QP problem solver - huge computational resources needed
[Sequential Mimimal Optimization](https://www.microsoft.com/en-us/research/uploads/prod/1998/04/sequential-minimal-optimization.pdf)
## Genetic algorithm used in IDS
* https://scholar.google.com.tw/scholar?hl=zh-TW&as_sdt=0%2C5&q=ids+genetic+algorithm&btnG=&oq=IDS+genetic
* http://bit.csc.lsu.edu/~jianhua/krish-1.pdf
## 2022/11/16
* [pytorch data augmentation for fashion minst dataset](https://discuss.pytorch.org/t/data-augmentation-fashion-mnist-image/136762)
* [pytorch data augmentation](https://medium.com/dejunhuang/learning-day-23-data-augmentation-in-pytorch-e375e19100c3)
* [Pytorch fashion minst refer](https://pytorch.org/vision/stable/generated/torchvision.datasets.FashionMNIST.html)
* [Pytorch data augmentation refer](https://pytorch.org/vision/stable/transforms.html)
* SVM homework to-do
* [callback function implementation](https://debuggercafe.com/using-learning-rate-scheduler-and-early-stopping-with-pytorch/)
* [Pytorch ZeroToAll](https://github.com/hunkim/PyTorchZeroToAll)
* [香港大學 pytorch class](https://drive.google.com/drive/folders/0B41Zbb4c8HVyUndGdGdJSXd5d3M?resourcekey=0-s90CYmIbmbqbO1Mvtwmlog)
* Training forward+backward backword for weight update
## 2022/11/6
* SVM 解法1 : 二次規劃解 Lagrange Multiplier -> 資料集大量時使得Q矩陣次方增長,無法放入記憶體中,故實作中不可行
* SVM 解法2 : Sequential Mimimal Optimization - 任選兩組Lagrange Multiplier 進行最佳化,逐次進行直到所有 Lagrange Multiplier被最佳化,不須使用Q矩陣
* 使用QTS產生w與b的組合不可行,該方法需要每次迭代去計算每個訓練資料的來進行調整 ->需要巨量時間
* 問題點
* 如何定義粒子產生的超平面的最佳解與最差解 > 對每筆訓練資料(萬筆數據)去計算距離太花費時間
* 不知道如何更新機率矩陣的方向
* 不知道產生粒子的範圍
* QTS與SVM結合的方向
* 如何產生 support vector > 從訓練資料中隨機選擇 > 如何搜尋更新方向
* 可著重於QP solver的矩陣大小減少 > 選擇Lagrange Multiplier為零的行列刪除 > 屬於組合最佳化問題嗎?
* 選擇 Lagrange Multiplier 的組合 可能有Lagrange Multiplier為零表示不屬於support vector > 組合最佳化
## 2022/11/5
* QTS implement to first & second heuristic in SMO
* QTS to find support vectors
* **?** heurisitic in grid search
## 2022/10/23
### 高等計算機網路 期中報告論文
> Enhanced Interval Trees for Dynamic IP Router-Tables
#### 5 BOB For Nonintersectiong Ranges
* Turn proposed enhanced interval tree into binary tree on binary tree (BOB)
#### 4.2 Proposed Enhanced Interval Tree
* the PTST of **[28]** with the following modifications/restrictions
* Some nodes of the PTST may be empty
* The total number of nodes in the PTST is at most $2n$, $n$ is the number of ranges stored in the PTST - **PTST size constraint**
* the algorithm to insert a range

* After insertion, the traditional red-black tree rebalance pass is made that may require color changes and at most one rotation
* The tree structure may lead to a violation of the range allocation rule
* After insertion, $|PTST| \leq 2|R|+1 < 2(|R|+1)$, where $|PTST|$ is the number of ranges stored after the insertion, which is not violated the **PTST size constraint**
* Time required to insert a range is $O(height(PTST))=O(\log n)$ without a rebalancing rotation
* Red Black Tree Rotation
* 
* $S=\{r|r\in ranges(x)$ **and**$start(r) \leq point(y)\leq finish(r)\}$
* $ranges'(x)=ranges(x)-S$ and $ranges'(y) = ranges(y)\cup S$
* The time required doing LL or RR rotation deopends on the time taken to determine $S$, remove $S$ from $ranges(x)$, and add $S$ to $ranges(y)$
* The time for LR and RL rotations is **roughly twice** that for LL and RR rotations
* Delete a Range


## 2022/10/22
### 高等計算機網路 期中報告論文
> Enhanced Interval Trees for Dynamic IP Router-Tables
#### Enhanced Interval Tree
Interval Trees
* **[27]** Interval Tree
* Each range $r$ in the left subtree of $z$ satisfies
>[以下可使用線段圖及樹狀圖來表示]
>
$(start(r)<start(range(z))$ or
$(start(r)=start(range(z))$ and $finsh(r)>finish(range(z)))$
* Each range $r$ in the right subtree of $z$ satisfies
$(start(r)>start(range(z))$ or
$(start(r)=start(range(z))$ and $finsh(r)<finish(range(z)))$
* prefix insertion/deletion take $O(log\ n)$ time
* while longest matching-prefix with number of $k$ match the given address
* it takes $O(min(n,k\log\ n))$ time to find the longest matching-prefix
* **[28]** Interval Tree
* a point, $point(z)$, associated with every node of the RBT
* the point of the node $x$ on the left subtree rooted by $z$ satisfies
$point(x)<point(z)$
* the point of the node $y$ on the right subtree rooted by $z$ satisfies
$point(y)>point(z)$
* The point search tree (PTST) meet the condition above and below
* $start(r)\leq point(z) \leq finish(r)$ stored in the $root$ node
* $finsh(r) < point(z)$ stored in the left subtree
* the remaining ranges stored in the right subtree
* each node in PTST maintain two lists $left(z)$ and $right(z)$
* the longest matching-prefix can be found in $O(\log n+k)$ time
* Insertion/Deletion are very expensive
* Every node of the PTST in **[28]** must be nonempty
* The high complexity of the update operation caused by the requirement that $ranges(z)$ be nonempty for every node $z$
* Rebalancing rotations

#### Preliminaries

* $r=[u,v]$
* $u$ is the start point of the range, and $v$ is the end point of the range
* Example, $W=6$ with 1101\* prefix matches the address $\{110100, 110101, 110110, 110111\} = \{52,53,54,55\}$
* each range takes up to $2W-2$ prefixes to represent
* Let R be a range set. $ranges(d,R)$ is the subset of ranges of $R$ that match the address $d$
* Example, When $R=\{[2,4],[1,6],[7,10]\},ranges(3)=\{[2,4],[1,6]\}$
## 2022/10/21
### 高等計算機網路 期中報告論文
> Enhanced Interval Trees for Dynamic IP Router-Tables
#### Related Work
* Gupta et al. **[16]** proposed the DIR-24-8 scheme.
* This scheme contains two table, **TBL24** and **TBLlong**
* $2^{24}*16$ bit + $t*256$-entry blocks
* The scheme accesses memory at most twice in a worst case.
* [Reference](https://marco79423.net/articles/ip-lookup-%E6%BC%94%E7%AE%97%E6%B3%95-dir-24-8-basic)
* The scheme Waldvogel et al.**[17]** proposed performs **a binary search** on **hash tables** sorted by **prefix length**
* This scheme performs lookup in $O(logW)$ time
* **[18]** developed an alternative takes $O(W\ + log\ n)$ timein lookup with a n-prefixes table
* These methods above **aren't be suited in dynamic table**, since they use **expensive precomputation**
* Dynamic TCAM
* Shah and Gupta **[7]** takes **insert/delete** in $O(W)$ time
* Basu and Narlika **[19]** proposed **a dynamic table design** for **pipelined forwarding engine**
* Suri et al. **[20]** employs the B-tree for dynamic LMPTs
* **longest matching-prefix lookup** performs in $O(log\ n)$ time
* **Insert/Delete** takes $O(Wlog\ n)$ time
* Cache misses are $O(log\ n)$ times
* Sahni and Kim **[21],[22]** employ **a collection of red-black tree (CRBT)** and **an alternative one (ACRBT)** in dynamic LMPT
* These schemes perform ops in $O(log\ n)$ time
* The number of cache misses is also $O(log\ n)$
* ACRBT can be modified easily to **the biased-skip-list**
* **[23]** obtains a biased skip list for dynamic LMPT
* Lu and Sahni **[24]** provide a priority search tree to obtain $O(log\ n)$ time in routing-ops
* **[25]** represented a binary trie in **highest-priority prefix-table(HPPT)**
* $O(W)$ time in routing ops and cache misses
* Gupta and McKeown **[26]** developed two DS, **heap on trie** and **binary search tree on trie**, for dynamic **highest-priority range-table(HPRT)**
* **heap on trie(HOT)** takes $O(W)$ time on lookup and $O(Wlog\ n)$ time for insert/delete
* **binary search tree on trie(BOT)** takes $O(Wlog\ n)$ time for a lookup and $O(W)$ time for an insert/delete
## 2022/10/20
### 高等計算機網路 期中報告論文
> Enhanced Interval Trees for Dynamic IP Router-Tables
#### Intro
* data structures for dynamic NHPRTs, HPPTs, and LMPTs
* Developing the terminology in Section 3
* Review two interval tree and construct their proposed method in Section 4.
* BOB in Section 5
* Ranges of different route table discussed in Section 6
* PBOB for HPPTs in Section 7
* LMPBOB for LMRTs in Section 8
* Experimental results in Section 9
#### Related Work
* LMPT researches
* Ruiz-Sanchez et al. review DS for static LMPTs **[4]**
* Sahni et al. review DS for both dynamic and statics. **[5]**
* TCAM isn't recommend **[7]**
* simple and efficient solution but requiring special hardware **[8]**
* EZchip Technologies claim that classifiers can forgo TCAMs in favor of commodity memory solutions **[2]**
* special hardware require more power and board space
* Trie-based DS for LMPTs **[9]** - **[15]**
* **[9]** perfom dynamic router-table ops in $O(W)$ time
* **[10]** - **[15]** optimize lookup time and memory usage, these research turned out fast lookup speed and poor speed at insert/delet time - only suitable for static router-tables
## 2022/10/17
### 高等計算機網路 期中報告論文
> Enhanced Interval Trees for Dynamic IP Router-Tables
#### Abstract
* interval tree DS for dynamic IP router tables - Binary tree in Binary tree (BOB)
有很多變體去因應不同的IP router tables
* BOB for nonintersecting ranges rule filters & highest-prioity tie breaker
Example : prefix filter with longest-prefix tie breaker
* BOB with n-rule router table
* match address found in $O(log^2\ n)$ time
* updates and deletions in $O(log\ n)$ time
* CBOB (Compact BOB)
* PBOB (Prefix BOB) with all rule are prefixes in filters
* $W$ is the longest prefix
* match address found in $O(W)$ time
* updates and deletions in $O(W)$ time
* LMPBOB (longest Matching-Prefix BOB) with all rule are prefixes in filters & longest matching prefix is to be done
* $W$ is the longest prefix
* match address found in $O(W)$ time
* updates and deletions in $O(log\ n)$ time
* $O(log\ n)$ time in cache misses
* BOB and PBOB perform each of the three dynamic-table operations in $O(log\ n)$ time and with $O(log\ n)$ time
#### Intro
* the repesentation of rule-table rule - $(F, A)$, $F$ is a filter and $A$ is an action
* Action examples : dropping the packet, forwarding the packet, and reserving the bandwidth
* Each filter with a range of $[ u,v ]$ is assumed, where destination address $d$ is satisified $u\leq d\leq v$
* A tie breaker is needed to select a rule from a set of matched rules since several rules may match a target address
## 2022/10/6
### [Survey of intrusion detection systems: techniques, datasets and challenges](https://link.springer.com/content/pdf/10.1186/s42400-019-0038-7.pdf)
### 閱讀論文進度:看到 Performance metrics for IDS
> Murray et al., has used GA to evolve simple rules for
network traffic (Murray et al., 2014).
**Every rule is represented by a genome and the primary population of
genomes is a number of random rules.**
Each genome is comprised of different genes which correspond to
characteristics such as IP source, IP destination, port
source, port destination and 1 protocol type (Hoque
& Bikas, 2012).
### 編碼方式:一個染色組內含多個染色體,一個檢測規則(特徵)為一個染色體
## 2022/10/3
> SVM is inefficient at large size problem, Sequential Minimal Optimization (SMO) to divde problem into several sub-problem
## 2022/9/30
### 3篇論文未看
- [ ] [Intrusion Detection System Based on SVM for WLAN](https://www.sciencedirect.com/science/article/pii/S2212017312000679)
- [ ] [Intrusion Detection System using Support Vector Machine
and Decision Tree ](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.206.5560&rep=rep1&type=pdf)
- [ ] [A novel SVM-kNN-PSO ensemble method for intrusion detection system
](https://www.sciencedirect.com/science/article/pii/S1568494615006328)
### 物聯網作業 50Startups營收預測
- [ ] [link](https://lms2020.nchu.edu.tw/course/homework/17508)
## 2022/9/26
### Support Vector Machine Research
#### 推導

假設分類兩資料點x0, x1分別位於w*x-b=1, w\*x-b=-1上,求兩線的距離
x0-x1形成一向量,兩線垂直距離為該向量於w\*x-b=0之法向量上之投影,也就是(x0-x1)與w之標準化向量做點積(dot product)可得 = (x0-x1).w/|w| = (w\*x0 - w\*x1)/|w| = 2/|w|, x0位於w\*x-b=1,x1位於w\*x-b=-1, 因此帶入後可得到(w\*x0 - w\*x1) = 1-(-1)
[distance between margins derivation](https://math.stackexchange.com/questions/1305925/why-is-the-svm-margin-equal-to-frac2-mathbfw)
## 2022/9/12
### 如何使用GNQTS實作分類器-參考機器學習的分類器
* SVM (Support vector machine)
* decision tree
### 參考面向
* 編碼方式
* 適應值計算方式
* 輸出函數
[外國版類似論文-皆有提到SEKM,SEIDS,未提到SE詳細資料](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9587486)
8
:::spoiler 2022/9/7
resource arrangement
將解空間切成數塊子解空間
在每個解空間撒點
vision search
* Transition
like crossover and mutation in genetic algorithm
* ExpectedValue
* Determination
:::
:::spoiler 2022/7/27
## 2022/7/27
> 需要改變產生訓練齊資料的方式,以Q#為例,如自2012/01-2022/12為實驗區間,則訓練期實驗區間為2011/01-2021/12
### 方法一
先計算總共多少月份,再以步數去產生多少組訓練期實驗區間
2011/01-2021/12為120個月分,若是Q#,則步數為三所以共有120/3=40組訓練期實驗區間
### 怎麼產生各類型實驗區間
#### X2N - 此類型為目標區間往前推指定步數的月份作為訓練期實驗區間
#### M2N - 目標月份為假設2011/01-2020/12,訓練期實驗區間為2010/12-2020/11
使用一個月的資料進行訓練,也就是2010/12訓練2011/01
#### Q2N - 目標月份為假設2011/01-2020/12,訓練期實驗區間為2010/09-2020/11
使用三個月的資料進行訓練,也就是2010/10-12訓練2011/01
#### H2N - 目標月份為假設2011/01-2020/12,訓練期實驗區間為2010/06-2020/11
使用六個月的資料進行訓練,也就是2010/07-12訓練2011/01
#### Y2N - 目標月份為假設2011/01-2020/12,訓練期實驗區間為2010/01-2020/11
使用十二個月的資料進行訓練,也就是2010/01-12訓練2011/01
#### X# - 使用去年相同長度月份的資料進行訓練
#### M# - 使用去年同月的資料訓練,也就是2010/01訓練2011/01
#### Q# - 使用去年同月的資料訓練,也就是2010/01-03訓練2011/01-03
#### H# - 使用去年同月的資料訓練,也就是2010/01-05訓練2011/01-05
### 步數
y=12, h=6, q=3, m=1
### 取資料的辦法
目前以記錄股票日期在csv檔中的index, 例如2019/02/01在csv檔中為第2192個位置
以陣列存取 date[2192] = “2019/02/01”
從txt檔中讀取最新的日期為2020/12/31
> 目前存到vector中,可以改為array較快速嗎?
### 使用方法
先決定M,Q,H,或是Y的月份長度
從最新月份開始產生訓練期實驗區間
- 若為X#類型,假設目標結束月份為n,則訓練期結束月份為n-12
再根據M,Q,及H,目標起始月份為n, n-2, n-5
- 若為X2N類型,假設目標結束月份為n,則訓練期結束月份為n-1
再根據M,Q,H,及Y,目標起始月份為n, n-2, n-5, n-11
:::
## 111/2/16
滑動視窗-股票區間擷取
- [x] 月份
- [ ] 季
- [ ] 半年
- [ ] 年
QTS+MA更改寫法與精簡
GNQTS+MA進行中
## 111/2/14
寫滑動視窗-股票區間擷取
進行實驗-看GNQTS方法,與QTS結合
## 111/2/6
列印最佳解實驗,確認實驗結果正確
列印成CSV檔時有錯,需要更改
## 111/1/24
滑動視窗擷取測試期與訓練期區間
## QTS結合MA
### 12/16
亂數與MA產生對齊,交易報酬的數字有些誤差
## 股票擇時作業
使用MA做股票擇時以及對答案:2603TW
| 買一 | 買二 | 賣一 | 賣二 | link |
| ---- | ---- | ---- | ---- | --- |
| 5 | 20 | 5 | 20 | [link](https://drive.google.com/file/d/1fl2A999GbZdgPicUeDtYinMG4lbDZ846/view?usp=sharing) |
| 5 | 20 | 5 | 60 | [link](https://drive.google.com/file/d/15zp3on0XCwb1VwLVISMPZwSmfrRe4JtQ/view?usp=sharing) |
| 5 | 60 | 5 | 60 | [link](https://drive.google.com/file/d/1AM29fzdGM-WrKYxRVwk8mhKEn6eByz_s/view?usp=sharing) |
| 6 | 18 | 6 | 18 | [link](https://drive.google.com/file/d/1xkfdP2PK6gfap6mzxhHEhoeoS0vrt-2d/view?usp=sharing) |
| 6 | 50 | 6 | 50 | [link](https://drive.google.com/file/d/1kq1XjFUSlUm4jdEUvae3TuYQsISTgeBh/view?usp=sharing) |
| 18 | 50 | 18 | 50 | [link](https://drive.google.com/file/d/10t9wZq5_6b0STPgWMcOfwF9Z3abjn3sT/view?usp=sharing) |
| 20 | 60 | 20 | 60 | [link](https://drive.google.com/file/d/1itJJbjSy86k4jAkUJ6aYfLrPL0y6uwb6/view?usp=sharing) |
## 研究方向
股票擇時
## 2021/10/28 Quatumn Gate Tensor Practice
