2022/1/13 meeting(information theory,decision tree,lightGBM,zero trust)

# 2022/1/13 meeting(information theory,decision tree,lightGBM,zero trust) ## information theory information should conatins three propertites: 1. Information conatained in events ought to be defined in terms of some measure of the **uncertainty** of the events.(以訊息中發生的機率來決定此事件的測量量) 2. **Less certain** events ought to contain more information than certain events.(若資訊表達的事越不可能發生表示其所帶來的資訊量較大，因為一般不會預期發生，故可視為較重要的資訊) 3. The information of unrelated/independent events taken as a single events should equal the **sum** of the information of the unrelated events. >define information content Q Q = -log P (P is the occurance of events) >Entropy H = $\displaystyle\sum_{i=0}^n-P_{i}\log P_i$ In impact of Code Deobfuscation and Feature Interaction in Android Malware Detection，they use information entropy to calculate the relative of certifcate as a feature.They denote the frequency of each distinct value collected from the RDNs. ## decision tree 基本alogrithm: ![](https://i.imgur.com/e7yUWli.png) 利用遞迴的方法，決定目前branch的條件，將目前的data分成不同的part，再將此subset建出遞迴的做出subtree，並return 此subtree，最終組成較大的tree。而當停止的條件滿足時就將最終的樹葉結果回傳。在此過程中，我們有四個決定: 1. 要切成多少樹 2. 切的標準是甚麼 3. 終止條件 4. 最終回傳值 ### C&RT C = 2(binary tree) $g_t(x) = E_{in}$ - optimal constant ![](https://i.imgur.com/prq07yB.png) 除了每次切成2部分與Output為定值外，利用decision stump來只根據一個feature來將目前的資料切成兩半，除此之外並用purifying來判斷切在那裡好，其中|$D_c with h$|表示其權重，越大的權重表示其purity的重要性。而purity在classification常用Gini * Gini index: 1 - $\displaystyle\sum_{k=1}^K$$(\dfrac{\displaystyle\sum_{n=1}^N[y_n=k]}{N}$)^2 而演算法停止的條件: 1. all $y_n$ the same:impurity = 0 $\overset{}{\implies}$$g_t(x)=y_n$ 2. all $x_n$ the same: no decision stumps ![](https://i.imgur.com/1SGO2RD.png) 但如果將樹切割到底，則$E_{in}(G)=0$，其可能發生overfit，因此需要要regularizer來讓樹不要長太深，而將樹削減的方法便是每次將一個葉子拔掉來看如何，找出最好的後再繼續拔下一片葉子。 ## zero trust - [ https://kopu.chat/zero_trust/] Principles: > **Never trust,always verify** 在內網的人不見得就是本人 > **Implement least privilege** 用最剛好的方法來驗證 > **Assume Breach** 假設有突破口，要如何反應與修復。傳統上利用內網與外網來區分是否能夠存取，而內外網則以防火牆隔離，但若內網中任何一台一電腦被駭入，則其便可能獲取整個內網的服務。 Zero Trust （零信任）與傳統資安架構最大的差異是：不會因為使用者在內網會給予完全的權限。**Zero Trust 邏輯當中沒有內網與外網的概念，而是相當於把所有服務都放到網路上，接下來在任何服務存取時都會驗證一次使用者的權限。**，而其實現方式便是Single Sign-On，2;