--- tags: Master Project, Brief --- <!-- https://hackmd.io/UK2Tiyb-ScyI5AQdC42M0w --> <!-- 本文地址: https://hackmd.io/@poi48/B1fuYUDAE --> # Brief 20190607: LoPub 003 ## Junction tree 看不懂 QAQ ![image alt](https://media1.tenor.com/images/d19f829a6bbeb509943f286fe1c79053/tenor.gif?itemid=10207771) ## LoPub [V-A-2] learning the original data distribution. - 1-dimensional distribution on each attribute independently. → lead to the significant degradation of the utility - d-dimensional joint distribution. → possible domain will increase exponentially with the number of dimensions → low scalability and signal-noise-ratio problems find a solution for **reducing the dimensionality** while **keeping the necessary correlations**. ![](https://i.imgur.com/3IIypfL.png) [V-A-4] **Local Data Protection.** -> RR [V-A-5] **Multi-dimensional Distribution Estimation.** -> EM-based, Lasso-based, hybrid approach. [V-A-6] **Dimensionality Reduction.** -> junction tree [V-A-7] **Synthesizing the New Dataset.** -> sample each low-dimensional dataset according to the connectivity of attribute clusters and the estimated joint (or conditional) distribution on each attribute cluster ### Local Data Protection [V-B] - representing the data record as a Bloom filter. - introduce uncertainty Particularly, Bloom filter with multiple hash functions encodes a set of data items into a pre-defined bit string Bloom Filter 是一个基于概率的資料結構:它只能告诉我们一个元素绝对不在集合内或可能在集合内 HASH Func [Wikipedia](https://zh.wikipedia.org/wiki/布隆过滤器) [Bloom Filters by Example](https://llimllib.github.io/bloomfilter-tutorial/) Randomized Response ![](https://i.imgur.com/kBkmCO3.png) f: 反轉 Bit 的機率(硬幣的機率) 調整 f 以調整 privacy level。 [Wikipedia](https://zh.wikipedia.org/wiki/隨機化回答) 然後把 RR 保護後的 Bit Vector 丟出去:每個 record 之中的每個 item 一個 Bit Vector。 $$ ε = 2dh ln(2-f)/f $$ h: Hash func 數量 d: Data Dimension ### 提問 - [ ] 評估 LDP - [x] 合成後的資料集合是否**保留相關性**?:Junction tree - [ ] 相關性用於 ARM。 - [x] 機制符合 LDP?:使用現有的機制 - [ ] 實際測試:做 LoPub 丟進 (MS-)Apriori 與原始 ARM 相比校。 - [ ] LoPub 程式碼 - [ ] LaTeX?