FYP Sampling Order

--- tags: FYP --- # FYP Sampling Order # Problem Statement Given: - a set of events $S$ - a set of labels $\Sigma$ - a labelling function $\lambda: S \rightarrow \Sigma$ - a set of dependencies $D \subseteq \Sigma \times \Sigma$ that is a symmetric and reflexive relation - a partial order $P$ on $S$ that respects $D$: $\forall s_1, s_2 \in S$, if $(f(s_1), f(s_2)) \in D$, then either $(s_1, s_2) \in P$ or $(s_2, s_1) \in P$. Also called as Mazurkiewicz partial order - a natural number $d$ representing the bug depth Main question: ::: info Given a number $d$ and a Mazurkiewicz partial order $P = (S, \leq)$. What is the smallest set of linearizations $\mathcal{H}$ of $P$ such that for every $d$-tuple $b \in \Sigma^d$, there is a linearization $l \in \mathcal{H}$ that hits $b$? ::: --- # Definitions Given $\Sigma, \lambda, P = (S, \leq)$ and $|S| = n$. Let $lin(P)$ be set of all possible linearizations of a partial order $P$. Definition 1: : The set of strings $T \subseteq \Sigma^n$ is called ***labelled realizer*** of $P$ if $\lambda^{-1}(T)$ is a cover of $P$ Here, $\lambda^{-1}(T) = \{l | \lambda(l) \in T\}$ Definition 2: : $T$ is a ***labelled hitting family*** of order $d$ if for every $\sigma \in \Sigma^d$, there is a $w \in T$ such that $\sigma$ is a subsequence of $T$, and $\lambda^{-1}(w) \in lin(P)$. # Things to Explore - [!] For any $(\sigma_1, \sigma_2) \in \Sigma$, consider subset of $P$ s.t. the labels of the events are in $\{\sigma_1, \sigma_2\}$. It must form a total order - *Proof*. Any two events are comparable - Extension to this: consider subset of $\Sigma$ such that it is also transitive. Then the above holds. - [?] Suppose we only have two labels, no dependency between the two. Let $d$ be the bug depth. How to construct the labelled hitting family? - If $d = 2$? - Make a linearization with alternating label - General case $d$? - Make a linearization with alternating label - What if $d > n/2$? - {abab,abba} (d = 3, n = 4) - Suppose we have $k$ labels, - If $d = 2$? - Put all the distinct labels first, then the rest is free - General case $d$? - ??? - [?] Suppose there are $2k$ dependencies $(\sigma_1, \sigma_2) \in D$ such that $\sigma_1 \neq \sigma_2$. If $k < n$, the worst case is when we can view the partial order with $n - k$ chains, each has equal number of elements (i.e. $\frac{n}{n - k}$) - Intuition: suppose there are three labels $\Sigma = \{a, b, c\}$ and we have $2 \cdot 2$ dependency between different labels, namely $(a, b)$, $(b, c)$ (and its symmetric counterpart). - We can get smaller upper bound for the hitting family. This has similar case as above # Findings - $d$-labelled hitting family problem is NP-hard, even for $d = 2$ - This implies that there is no FPT algorithm in $d$ - [?] Need to enumerate all possible bugs (at least...?) - [?] But maybe there is FPT algo in $|\Sigma| + d$ - Given a word $w$ that correspond to some linearization $l \in lin(P)$ of a Mazurkiewicz partial order $\tau = (P, D, \lambda)$. We can construct the partial order, given the dependency $D$ and $w$ in $O(n|\Sigma|)$ time. - There is an algorithm runs in $O(n|\Sigma|)$ to check the admissibility of a bug - We can check the admissibility of multiple bugs by forming it as a prefix trie # Interesting Questions - P1: Given $P$, $\lambda$, $T$. Check if $T$ is a labelled realizer of $P$ - P2: Given $P$, $\lambda$. Check if there is a labelled realizer of size $\leq k$ - P3: Given $P$, $\lambda$, $T$, $d$. Check if $T$ is a $d$-labelled hitting family of $P$ - P4: Given $d$, $P$, $\lambda$. Check if there is a $d$-labelled hitting family of size $\leq k$ - P5: Given $\lambda$, $P$, $D$, and a word $w$ that correspond to some linearization $l \in lin(P)$. Check whether it is sufficient that $w$ itself is needed to construct the partial order $P$. - P6: Given $\tau = (P, \lambda, D)$ and depth $d$. Count how many bugs are admissible - P7: Given a prefix trie $T = (V, E, \lambda)$, which is a rooted tree with labelling for each node, of height $h$ that represenets a set of bugs. Design an algorithm that runs in time of at most $$ |T| \cdot poly(n) $$ - P8: Given $d$, $P$, $\lambda$. Check if there is a $d$-labelled hitting family of size $\leq k$ if we consider all, except $n = |S|$, are constants. # Answers ## P1 [DEP] Given $P$, $\lambda$, $T$. Check if $T$ is a labelled realizer of $P$ Lemma 1 : For all $\sigma \in \Sigma^n$, there is at most one $l \in lin(P)$ such that $\lambda(l) = \sigma$ *Proof*. Suppose not, i.e. there are two different linearizations $l_1, l_2$ such that $\lambda(l_1) = t = \lambda(l_2)$. Denote $l[i]$ as the $i$-th element from the ordering of a linearization $l$ and $\sigma[i]$ as the $i$-th label of a word $\sigma$. There exists a smallest index $i$ such that $l_1[i] \neq l_2[i]$. Consider $P' \subseteq P$ where $P'$ consists of events that have label $\sigma[i]$. $P'$ must be a total order, since the dependency $D$ is reflexive. $P'$ must appear as a subsequence of any linearization. However, this contradicts the fact that $l_1[i] \neq l_2[i]$. Take any $t \in T$. We can easily find $l$ (if there is any) such that $\lambda(l) = t$. Consider any subsequence of $l$ such that the events have the same label. There must be exactly one ordering. Using that ordering, we can easily verify whether it is a valid linearization or not in $\mathcal{O}(n^2)$ time, where $|S| = n$. Call the set of valid linearizations as $\mathcal{L}$. After converting the set of words into set of linearizations, we can easily find the intersection in $\mathcal{O}(n^2 |T|)$ time. First, construct a set $A = S \times S$. For every $l \in \mathcal{L}$, we find the intersection of $A$ and $l$ and update the set $A$. Finally, we check whether the final result is $P$. This can be done faster by finding intersection of every two linearizations and merge them. This yield $\mathcal{O}(n^2 \lg |T|)$ time complexity. ## P2 Given $P$, $\lambda$. Check if there is a labelled realizer of size $\leq k$ We will do reduction from dimension of partial order problem. From Yannakakis paper, the problem of determining if a partial order has dimension of at most $k$ for $k \geq 3$ is NP-complete. For any partial order $P$, we can construct an equivalent Mazurkiewicz partial order, where each events have different labels and the dependency $D$ contains all comparable elements in $P$. A YES-instance for dimension problem is also a YES-instance for labelled realizer problem, as they have the same realizer of the partial order $P$. The same can be said for opposite direction. Therefore, along with **P1**, this problem is NP-complete. ## P3 [WIP] First, we can check the validity of $T$ as described in previous section. This takes $\mathcal{O}(n^2)$ time for each word in $T$. Naive way to do this is to check all possible $b \in \Sigma^d$. Each $b$ takes $\mathcal{O}(n + d)$ time to verify whether it is a subsequence of a word $t \in T$ or not. Hence this runs in $\mathcal{O}((n + d)|T| \cdot |\Sigma|^d)$ (Not sure how to prove that the lowerbound is also exponential) We will prove that the complexity to verify correctness is $\Omega(|\Sigma|^d)$ Suppose that there exists an algorithm that runs in $o(|\Sigma|^d)$. Then there must be some bug $b \in \Sigma^d$ that is not checked ## P4 We will do reduction from the original $d$-hitting family problem, which is NP-hard. It has decision problem "Is there a $d$-hitting family of size $\leq k$" For any partial order $P$, we can construct an equivalent Mazurkiewicz partial order whose labels of every event is different. A YES-instance for $d$-hitting family problem is also a YES-instance for $d$-labelled hitting family. Suppose $\mathcal{H}$ is the $d$-hitting family, then $T = \{\lambda(h) | h \in \mathcal{H}\}$ is a $d$-labelled hitting family for $P$. A YES-instance for $d$-labelled hitting family of $P$ is also a YES-instance for $d$-hitting family. Suppose $T$ is the $d$-labelled hitting family, then $\mathcal{H} = \{ \lambda^{-1}(t) | t \in T\}$ is $d$-hitting family for $P$. Hence, this problem is NP-hard ## P5 Let $P_w$ be a partial order constructed from word $w$ with dependency $D$. To prove that $w$ can construct $P$, we need to show that $P_w \subseteq P$ and $P_w \supseteq P$. It is easy to see that $P_w \supseteq P$ as $\lambda^{-1}(w)$ is a linearization of $P$. Otherwise, $w$ is not a valid linearization at the first place. We will show that $P_w \subseteq P$. Denote $s[i]$ as the $i$-th letter of a sequence $s$. Let $l \in lin(P)$ such that $\lambda(l) = w$. Pick any $i$, $j$ such that $i < j$ and $(l[i], l[j]) \in P_w$. There are two cases: - $(w[i], w[j]) \in D$. In this case, $(l[i], l[j]) \in P$ - $(w[i], w[j]) \not\in D$. We will do reduction by finding smallest $i < k < j$ such that $(l[i], l[k]) \in P_w$ and $(l[k], l[j]) \in P_w$. Eventually, we will fall into the case above. ## P6 We can use the idea of vector clocks to solve this problem. We know that for each label, it is a total order. Hence, we can view this as $|\Sigma|$ machines executing concurrently. We name the machines as labels in $\Sigma$. First, we find any valid linearization $w$ for the partial order. This correspond to one valid execution of the machines. Denote $s[i]$ as the $i$-th element of the sequence $s$. We can find the timestamp for each of the label in $w$. A timestamp is denoted as a vector of size $|\Sigma|$, where every element is a non-negative integer. We need to have a hashmap $H$ contains mapping from labels to a pair of timestamp and index. $H[\sigma]$ will contain the latest information of the event that we have encountered with label $\sigma$, i.e. the timestamp and the index of the event in $w$. 1. Initialize a hashmap $H$ with $\epsilon$ as its default value 2. For each $i = 1 \dots n$ 1. Let $t := \textbf{0}_{|\Sigma|}$ 2. If $H[w[i]] = \epsilon$, then - $t_{w[i]} := 1$ 3. Otherwise, - $t_{w[i]} := H[w[i]]_{w[i]} + 1$ 4. For each $\sigma \in \Sigma$ where $\sigma \neq w[i]$ and $H[\sigma] \neq \epsilon$ and $(\sigma, w[i]) \in D$: 1. $t_\sigma := H[\sigma]_\sigma$ 5. $H[w[i]] := t$ where: - $\epsilon$ denotes the absence of value (null) - $t_\sigma$ denotes the corresponding element for label $\sigma$ in the timestamp $t$. The algorithm above runs in $O(n|\Sigma|)$. Now, we will fix some bug $b \in \Sigma^d$. We want to check whether $b$ is admissible w.r.t. $\tau$ or not. We first construct the vector clock for each of the event. Now we partition the events based on the labels, arranged in increasing order of their vector clocks. Denote $V[\sigma]$ as a list of vector clocks in increasing order of for events with label $\sigma$, and $V[\sigma][i]$ be the $i$-th element of the list. The following algorithm checks whether the bug $b = b_1 b_2 \dots b_d$ is an admissible bug: 1. For each $\sigma \in \Sigma$: 1. $p[\sigma] = 1$ 3. For each $i := 1 \dots d$ 1. For each $(\sigma, b_i) \in D$ where $\sigma \neq b_i$ 1. If $V[\sigma][p[\sigma]] > V[b_i][p[b_i]]$: 1. $p[b_i] = p[b_i] + 1$ If any of the step above fails, that $b$ is not an admissible bug. Hence, this algorithm runs in $O(n |\Sigma|)$ time ## P7 Given a prefix trie $T = (V, E, \lambda)$, which is a rooted tree with labelling for each node, of height $d$ that represenets a set of bugs (of depth $\leq d$). Design an algorithm that runs in time of at most $$ |T| \cdot poly(n) $$ ### Solution We can use the algorithm described in P6, but we store the state of the pointers for backtracking. We can keep this information using a dynamic array, adding new element when we recurse deeper to the tree, and popping element (and restore the state of the pointers $p[]$). At each node in the trie, the pointers $p[i]$ will be increased by at most $n$ steps as we will need to check all dependencies that contains the label. Hence, the algorithm will run in $O(|T| \cdot (|\Sigma| + n))$ with $O(d|\Sigma|)$ extra space. We can optimise this further to use only $O(d)$ extra space. Notice that at any node, we only advance at most one pointer $p[]$. We need to store the information of "how much the pointer has been moved from its original position". It will be maintained in a dynamic array, and the size is at most $d$. This information can be used when we backtrack. TLDR: $O(|T| \cdot (|\Sigma| + n))$ time with $O(d)$ extra space ## P8 Given $\tau = (P, \lambda, D), d$ and $k \in \mathbb{N}$. Check if there exists $d$-labelled hitting family of size $\leq k$. ### Solution #### Case $k = 1$ Let $b = b_1 b_2 \dots b_d$ be a bug. It is admissible if and only if there exists events $e_1, e_2, e_3, \dots, e_d$ such that: - $\lambda(e_i) = b_i$ - $e_1 = first(b_1)$ - $e_d = last(b_d)$ - $e_i$ is the earliest event with label $b_i$ such that $e_{i} \not<_P e_{j}$ for $1 \leq j < i \leq d$ *Proof.* (if). It is clear that if such events exist, then we can form a linearization that has $e_1 e_2 e_3 \dots e_{d - 1} e_d$ as its subsequence by adding $(e_i, e_{i + 1})$ edge in $P$ for $1 \leq i < d$ (only if). Suppose not, i.e. there is a linearization $T$ that hits $b$. Let $e'_1 e'_2 \dots e'_d$ be subsequence of events in $T$ whose labels correspond to $b$. Repeatedly do the following: let $i$ be the first index such that $e_i \neq e'_i$. If $i = 1$, we can swap $e'_1$ with $first(b_1)$, because $first(b_1) <_P s$ for any events $s$ s.t. $\lambda(s) = b_1$. If $i = d$, we can swap $e'_d$ with $last(b_d)$, because $s <_P last(b_d)$ for any events $s$ s.t. $\lambda(s) = b_d$. If $1 < i < d$, we can swap $e'_i$ with $e_i$. Suppose not, this implies that $e'_i <_P e_i$, but this would mean that $e'_i$ is the first event with label $b_i$ such that $e'_i \not<_P e_j$ for $1 \leq j < i$, a contradiction. Now construct a new graph $G$ such that: - All edges in $P$ is in $G$ - For all admissible bugs $b = b_1 b_2 \dots b_d$, let $e_1 = first(b_1)$, $e_d = last(b_d)$, $e_i$ be the first event with label $b_i$ such that $e_i \not<_P e_j$ for all $1 \leq j < i \leq d$. Add the edges $(e_i, e_{i + 1})$ to $G$, for $1 \leq i < d$ $G$ has cycle if and only if it is a NO-instance. (only if). We will prove its contrapositive form: *If it is a YES-instance, then $G$ does not have cycle* Let $l$ be any linearization that respects $G$. Construct a graph $G_l$ that corresponds to the linear order $l$. Notice that $P \subseteq G \subseteq G_l$. Since $G_l$ does not contain cycle, so does $G$. (if). Take its contrapositive form: If $G$ does not have cycle, then it is a YES-instance We take any linearization $l$ that respects $G$. Notice that we add edges for all admissible bugs, and it will be hit in any linearization of $G$.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.