---
tags: Master Project, Brief
---
<!-- https://hackmd.io/UK2Tiyb-ScyI5AQdC42M0w -->
<!-- 本文地址: https://hackmd.io/@poi48/B1fuYUDAE -->
# Brief 20190607: LoPub 003
## Junction tree
看不懂 QAQ

## LoPub
[V-A-2]
learning the original data distribution.
- 1-dimensional distribution on each attribute independently.
→ lead to the significant degradation of the utility
- d-dimensional joint distribution.
→ possible domain will increase exponentially with the number of dimensions
→ low scalability and signal-noise-ratio problems
find a solution for **reducing the dimensionality** while **keeping the necessary correlations**.

[V-A-4] **Local Data Protection.** -> RR
[V-A-5] **Multi-dimensional Distribution Estimation.** -> EM-based, Lasso-based, hybrid approach.
[V-A-6] **Dimensionality Reduction.** -> junction tree
[V-A-7] **Synthesizing the New Dataset.** -> sample each low-dimensional dataset according to the connectivity of attribute clusters and the estimated joint (or conditional) distribution on each attribute cluster
### Local Data Protection
[V-B]
- representing the data record as a Bloom filter.
- introduce uncertainty
Particularly, Bloom filter with multiple hash functions encodes a set of data items into a pre-defined bit string
Bloom Filter 是一个基于概率的資料結構:它只能告诉我们一个元素绝对不在集合内或可能在集合内
HASH Func
[Wikipedia](https://zh.wikipedia.org/wiki/布隆过滤器)
[Bloom Filters by Example](https://llimllib.github.io/bloomfilter-tutorial/)
Randomized Response

f: 反轉 Bit 的機率(硬幣的機率)
調整 f 以調整 privacy level。
[Wikipedia](https://zh.wikipedia.org/wiki/隨機化回答)
然後把 RR 保護後的 Bit Vector 丟出去:每個 record 之中的每個 item 一個 Bit Vector。
$$ ε = 2dh ln(2-f)/f $$
h: Hash func 數量
d: Data Dimension
### 提問
- [ ] 評估 LDP
- [x] 合成後的資料集合是否**保留相關性**?:Junction tree
- [ ] 相關性用於 ARM。
- [x] 機制符合 LDP?:使用現有的機制
- [ ] 實際測試:做 LoPub 丟進 (MS-)Apriori 與原始 ARM 相比校。
- [ ] LoPub 程式碼
- [ ] LaTeX?