# Program obfuscation via local mixing
Let's assume that a reversible circuit with $n^*$ wires and $m^*$ gates permutes the input such that the output permutation is indistinguishable from a strong pseudo-random permutation. The authors use this property of reversible circuits to build an algorithm that satisfies ROI (ROI is also program obfuscation and is similar to iO) for all reversible circuits.
### Reversible circuits
Reversible circuits were first explored to express computation such that it can be reveresed. For example, a + b = c is not a reversible operation. Because it is not possible to derive a, b from c. However, if the circuit outputs b as an auxilary value, the computation becomes reversible. Reversible circuits apparently are also natural fit for quantum circuits (a quick search for "reversible circuits and quantum computation" will produce many useful resources). The other property that reversible circuits have, and the one we're interested in, is that if you chain enough reversible gates the reversible circuit can be called a strong pesudorandom permutation. In others words, the reversible circuits permutes the input bits such that the output is indistinguishable from the truly random permutation to an adversary with polynomial resources.
A reversible circuit is expressed as collection of $n$ parallel wires and $m$ gates and, usually, the computation flow is from left to right. Each gate mutates at least 1 wire based on signals of other wires. There're several varieties of gates but we're only interested in Toffoli gate.
Toffoli gate is defined by 2 control wires, control function, and a target wire. The gate applies the control function on the control wires and sets the target wire as XOR of output of control function by itself. For example, if a, b are the control wires, $\phi$ is the control function, and c is the target wire. Then the gate computes: $c = \phi(a, b)\oplus c$.
Observe that with 2-bit controls there can only be $2^{2^4} = 16$ control functions. We refer to these gates as the "base gates". A base gate is defined over $n$ wires but it only touches 3 wires (2 controls and 1 target) called active wires. Another observation to make is any base gate is an inverse of itself and there exist a identity base gate with control function $\phi(a, b) = 0$
**Psuedorandomness of reversible circuits and t-wise independence**
[Gow96](https://www.cambridge.org/core/journals/combinatorics-probability-and-computing/article/abs/an-almost-mwise-independent-random-permutation-of-the-cube/64D106A4BB8EFB5AAB8670952B90B76A) conjectured that reversible circuits with sufficiently many gates where each gate is a 3-bit permutation is strong-psuedo random permutation. Gowers proved that reversible circuit with $m = O(n^3t^3 \log{\epsilon^{-1}})$ is $\epsilon$ close to t-wise independence. Note that Gowers's result only applies to any 3-bit permutation and not just the base gates.
Function $F(x)$ is called $\epsilon$ close to t-wise independent if the distance between following two tuples is $\leq \epsilon$:
- $F(x_1), F(x_2), ... F(x_t)$
- $y_1, y_2, ..., y_t$
where $y_i$ is sampled from truly random permutation distribution. The distance function is defined as the function $d(p,q)$ in [BH06](https://web.cs.dal.ca/~abrodsky/papers/k2n2.pdf) on page 3.
HMMR05, HB05 later improved the bound to $m = O(n^3t^2 + n^2t \log{\epsilon^{-1}})$ and worked only with the base gates. They also conjectured that 4-wise independece suffices for cyrptographic grade pseudomness (I suppose there're attacks with 3-wise independence, which makes one think 4-wise is plausibly secure. But there's no elaboration on the conjecture in the paper).
*Note: There're other recent papers that improve the bound and provide proofs. For example, [this](https://arxiv.org/abs/2404.14648) and [this](https://arxiv.org/abs/2406.08499).*
### Random output indistinguishibility (ROI) obfuscators
Any function $O$ is ROI with inner stretch function $\xi$ for input ensemble of circuits $\mathbb{C}_{n_{\kappa}, m_{\kappa}}$ if it satisfies the following property for any reversible circuit $C \in \mathbb{C}_{n_{\kappa}, m_{\kappa}}$:
$$O(C) \approx_c \pi(\hat{C}) : \hat{C} \leftarrow \mathcal{E}_{C, \xi(\kappa, n_{\kappa}, m_{\kappa})}$$
where $\mathcal{E}_{C, m}$ denotes set of all circuits that compute permutation $C$ with $m$ gates.
Definition of ROI is different from iO. But they provide similar functionality. ROI, like iO, requires the obfuscator to output circuits that are indistinguishable from a random circuit that computes the same function (/permutation) as the input circiut $C$.
**Separable ROI obfuscators**
A ROI obfuscator is $m_{\kappa}$ left (right resp.) separable if
1. the computational complexity of $m_{\kappa}$ prefix (suffix resp.) of the obfuscated circuit is low (ie $CC(C_{[0,m^{\kappa}]}) \lt m_{\kappa}/2$)
2. Post processing algorithm is of the form $\pi(C) = \pi_1(C_{[1, m_{\kappa}]}) | \pi_2(C_{[m_{\kappa}, *]})$ (resp. $\pi(C) = \pi_1(C_{[1, -m_{\kappa}]}) | \pi_2(C_{[-m_{\kappa},*]})$)
Observer that an obfuscator $O$ is $m_{\kappa}$-right separable if $O^{\dagger}$ ($O^{\dagger} = (O(C^{\dagger}))^{\dagger}$) is $m_{\kappa}$ left separable.
An obfuscator $O$ is called $m_{\kappa}$ separable obfuscator if it is both $m_{\kappa}$ right and left separable.
### Random input & output (RIO) obfuscators
RIO is built from function(s) that satisfy its two properties. It isn't necessary for the same function to satisfy both properties because the two properties are used at different places. However, it is important to highlight here that ROI is built from RIO based on the new assumptions authors introduce. In particular, assumption 8, which covers assumptions 4&7.
**Property 1:**
Any obfuscator function $O$ statisfies property 1 on input ensemble $\mathbb{C}_{n_{\kappa}, m_{\kappa}}$ with post processing function $\pi$ and inner stretch $\xi$ if the following holds:
$$(C_1, C_2) \approx_{c} O(C_1', C_2')$$
where
$$(C_1, C_2) \leftarrow O(C); C \leftarrow_{R} \mathbb{C}_{n_{\kappa}, m_{\kappa}}$$
and
$$(C_1', C_2') = (\pi(C_1'), \pi(C_2')) \leftarrow_{R} \mathcal{E}_{P_{C'}, \xi(\kappa, n_{\kappa}, m_{\kappa})}; C' \leftarrow_{R} \mathbb{C}_{n_{\kappa}, m_{\kappa}} $$
Property 1 says that running obfuscator function $O$ twice on same circuit $C$ must output obfuscated circuits that are not co-relatable.
**Property 2:**
Any obfuscator function $O$ satisifes property 2 on input ensemble $\mathbb{C}_{n_{\kappa}, 2m_{\kappa}}$ for sufficiently large leeway $\lambda$ and fixed circuit $Z_1, Z_2 \in \mathbb{C}_{n_{\kappa}, m_{\kappa}}$ if the following holds:
$$(O(C), \hat{C}_1, \hat{C}_2) \approx_{c} (\pi(C'), \hat{C'}_1, \hat{C'}_2)$$
where
$$C \leftarrow_{R} \mathbb{C}_{n_{\kappa}, 2m_{\kappa}}; \hat{C}_1 \leftarrow_{R} \mathcal{E}_{P_{C_{[1,m_{\kappa}]}|Z_1}, m_{\kappa}\lambda};\hat{C}_2 \leftarrow_{R} \mathcal{E}_{P_{Z_2|C_{[m_{\kappa},*]}}, m_{\kappa}\lambda}$$
and
$$C' \leftarrow_{R} \mathbb{C}_{n_{\kappa}, 2m_{\kappa}}; \hat{C'}_1 \leftarrow_{R} \mathcal{E}_{P_{C'_{[1,m_{\kappa}]}|Z_1}, m_{\kappa}\lambda};\hat{C}_2 \leftarrow_{R} \mathcal{E}_{P_{Z_2|C'_{[m_{\kappa},*]}}, m_{\kappa}\lambda}$$
Property 2 says that obfuscation of the circuit must not be co-relatable with original circuit even when given oracle access to $m_{\kappa}$-gate prefix of the original circuit.
### Assumptions
Assumption 4 states that there exist $n^*, m^*$ such that a random circuit in ensemble $\mathbb{C}_{n^*, m^*}$ is strong pseudorandom permutation (SPRP).
Assumption 5 extends assumption 4. For any random circuit $Q \leftarrow_{R} \mathbb{C}_{n^*, m}$, circuit $C \leftarrow_{R} \mathcal{E}_{P_Q, m_{\kappa}}$ with fixed permutation $P_{Q}$ where $m_{\kappa} \geq m^*m$, and $m^* \leq l_{\kappa} \leq (m-1)m^*$ following two distributions are indistinguishable.
1. Oracle access to $l_{k}$-gate prefix of $C$ and the remainder of $C$
2. Oracle access to $Q_1 | C'$ and $C'^{\dagger} | Q_2$, where $Q = Q_1 | Q_2$, and $C$' is a random circuit sampled from $\mathbb{C}_{n^*, m^*}$.
Assumption 5 says that oracle access to sufficiently long prefix, with at-least $m^*$ gates (recall $m^*$ satisfies assumption 4), of circuit $C$ and the remainder where $C$ is functionally equivalent to $Q$ is indistinguishable from two SPRPs that jointly compute $Q$.
Assumption 7 is relevant for theorem 16 and is the main assumption, along with assumption 8 (assumption 8 is an extension of assumption 7). It says even if adversary is given the random circuits, either tuple $(C_{0, l_{\kappa}}, C_{l_{\kappa,}})$ or $(Q_1 | C, C^{\dagger} | Q_2)$, something like assumption 5 still holds. First observe that now any tuple given is clearly distinguishible from a random tuple. Nonethless, authors assume that the sufficient inner stretch should erase any information, other than functionality, of $Q$.
Define $Q = H_1 | H_2$ where $H_i \in \mathbb{C}_{n^*, m_{i}}$ and $l_{i, \kappa} = m_im^{\#}$. Assumption 7 states that there exists $m^{\#}$ for which the following two distributions are indisginuishable
1. $\hat{C}_1 | \hat{C}_2$ where $\hat{C}_1$ is a random circuit with $l_{1, \kappa}$ gates that computes permutation $P_{H_1 | C}$ and $\hat{C}_2$ is a random circuit with $l_{2, \kappa}$ gates that computes permutation $P_{C^{\dagger} | H_2}$.
2. A random circuit $C$ that computes permutation $Q$ with $l_{1, \kappa} + l_{2, \kappa}$ gates.
Assumption 8 combines assumption 4&7 and states that there exist $n^*, m^*$ for which assumption 4 holds. Additionally there exist $m^{\#}$ that satisfies assumption 7 with respect to SPRP of assumption 4.
### ROI for all reversible circuits from RIO
**Theorem 16**
Given functions $O_1$, $\pi$, $O_2$ such that:
- $O_1$ is an obfuscator that satisfies property 1 of RIO with input ensemble $\mathbb{C}_{n^*, m^*}$, inner stretch function $\xi(\kappa, n^*,m^*) \leq m^{\#}$ where $n^*,m^*,m^{\#}$ statisfy assumption 8, and post processing algorithm $\pi$.
- $O_2$ is an obfuscator that statisfies property 2 of RIO with input ensemble $\pi(C) | \pi^{\dagger}(C)$ where $C \leftarrow \mathbb{C}_{n^*, m^*}$.
we can build obfuscator function that satisfies ROI for all reversible circuits.
Let input circuit be a reversible circuit with each gate in the set of base gates. Theorem 16 is proven by first constructing Random identity generator (RIG). RIG is then used to construct a Gate obfuscator (GO) for any base gate. GO is used to obfuscate each gate in the input circuit. After which obfuscated circuits of the gates are concatenated together using $O_2$. The resulting obfuscated circuit statisfies ROI.
### Random identity generator (RIG)
Random identity generator (RIG) is an $m^{\#}$-separeable ROI obfuscator for identity permutation with inner stretch $2m^{\#}$. It is used to (efficiently) sample obfuscated circuit that computes identity permutation on $n^*$ input wires.
Sampling a circuit that computes identity permutation with $2m^{\#}$ gates is not efficient in practice. RIG uses $O_1$ and assumption 8 to sample such a circuit efficeintly in the following way:
1. Sample $C \leftarrow \mathbb{C}_{n^*, m^*}$
2. $C' = O(C), C'' = O(C)$
3. Output $C' | (C'')^{\dagger}$
It is clear that circuit $C' | (C'')^{\dagger}$ computes an identity permutation and has inner stretch $2m^{\#}$. Step 2 justifies why $O_1$ should satisfy property 1 of RIO. That is $C'$ and $C''$ must not be co-relteable even though they are obfuscation of the same input circuit $C$.
Assumption 8 is used to prove that (informally) $C' | (C'')^{\dagger}$ is indistinguishable from a random circuit $I$ with $2m^{\#}$ gates that computes the identity permutation.
### Gate obfuscator (GO)
Gate obfuscator (GO) is a $m^{\#}$-separeable ROI obfuscator with inner stretch $2m^{\#}$ for a single base gate $\beta$. It is built using RIG.
To obfuscate a base gate $\beta$:
1. Repeatedly sample RIG until it outputs a circuit $C$ of which first gate is the base gate $\beta$
2. Remove the first gate from $C$ and insert in its place the identity base gate
The resulting cricuit $C$ is an obfuscated circuit for base gate $\beta$.
Step 1 of GO needs to be efficient and, with the following modifications, indeed it is. Observe that circular rotation of a reversible circuit that computes identity permutation, still computes the identity permutation. Hence, modify the algorithm to find the first gate that has same control function as the base gate $\beta$ and circularly rotate the circuit s.t. the found gate is the first gate. If base gate $\beta$ has control wires $\omega_0, \omega_1$ and target wire $\omega_2$ and the first gate has control wires $\omega_0', \omega_1'$, and target wire $\omega_2'$ swap wire $\omega_0'$ with $\omega_0$, $\omega_1'$ with $\omega_1$, and $\omega_2'$ with $\omega_2$ in $C$ and output the resulting circuit.
### Soldering obfuscated circuits
After obfuscating each gate in input circuit $C$ using GO we solder them together into a final obfuscated circuit using "append and solder" operation.
To solder two different obfuscated circuit outputs of GO, $\Gamma_1$ and $\Gamma_2$. Take the $m^{\#}$-gate suffix of $\Gamma_1$ and $m^{\#}$-gate prefix of $\Gamma_2$ and feed their concatenation to $O_2$. Then output the following,
$$\Gamma_{1, [1, m^{\#}]} | O_2(\Gamma_{1, (m^{\#}, *]} | \Gamma_{2, [1, m^{\#}]}) | \Gamma_{2, (m^{\#}, *]} $$
We feed $\Gamma_{1, (m^{\#}, *]}| \Gamma_{2, [1, m^{\#}]}$ to $O_2$ which explains why $O_2$ must have input ensemble as $\pi(C_1) | \pi(C_2)^{\dagger}$ where $C_1, C_2 \leftarrow_{R} \mathbb{C}_{n^*, m^*}$. It's because GO is $m^{\#}$-separable and, hence, $m^{\#}$-gate suffix of $\Gamma_1$ and $m^{\#}$-gate prefix of $\Gamma_2$ where the first is output of $(O_1)^{\dagger}$ and the second is output of $O_1$ are post-processed versions of random circuits sampled from $\mathbb{C}_{n^*, m^*}$.
Assuming $O_1$ is secure, sub-circuits $\Gamma_{1, [1, -m^{\#}]}$ (resp. $\Gamma_{2, (-m^{\#}, *]}$) are oracle accesses to $m^{\#}$-gate prefix (resp. inverse of $m^{\#}$-gate suffix) of random identity circuit sampled by GO for $\Gamma_1$ (respectively $\Gamma_2$). Hence, $O_2$ must satisfy property 2 for RIO.
### Summary of obfuscation
Let $C$ be input reversible circuits with gates $\gamma_1 | \gamma_2 |... |\gamma_m$.
1. For each gate $\gamma_i$ sample $\Gamma_i \leftarrow GO(\gamma_i)$
2. Set output obfuscated circuit $C^O = \Gamma_1$. For $j$ in range $[2, m]$, set
$$C^O = C^O_{[1, -m^{\#}]} | O_2(C^O_{(-m^{\#}, *]} | \Gamma_{j, [1, m^{\#}]}) | \Gamma_{j, (m^{\#}, *]}$$
3. Output $C^O$
### Implementing RIO
The words local mixing in the title are due to the approach authors introduce to build an RIO that satisfies property 1 and 2 simultaneously.
Authors first observe that constructing an RIO that provably satisfies property 1 and runs in polynomial time is hard because it implies CoNP = NP (**Question: How so? In particular why having polysize witness of transformation from functionally equivalent and same size circuit $C_1$ to $C_2$ imply CoNP=NP?**). Which is why they settle for an algorithm that assures $C_1$ and $C_2$ are not efficiently discernible.
Satisfying property 1 of RIO requires transforming the input circuit with $m^{*}$ gates to a functionally equivalent circuit with $m^{\#}$ gates. To do so the algorithm first introduces redundancy to the circuit by (1) randomly isolating small sub-circuits and (2) replacing them with bigger funtionally equivalent circuits. After which the resulting circuit will have $m^{\#}$ gates. However, the resulting circuit can easily be reversed if care is not taken to hide the traces of introduced redundancy. This is acheived by again randomly selecting sub-circuits and replacing them with funtionally equivalent but different circuits of same size. The hope is that the latter stage, since sub-circuits are randomly chosen, diffuses introduced redundancy over larger and larger parts of the circuit. The first stage is referred to as "inflationary" and the second stage is referred to as as "kneading".
#### Local mixing
Inflationary stage and kneading stage only differ in size of the circuits being replaced. Otherwise, both stages take local & independent sub-section of the circuit and replace it with a different functionally equivalent circuit. A single step in either stage is referred to as local mixing step.
The step is divided into 3 main parts:
1. Selecting an independent sub-section $C^{out}$ with $\ell^{out}$ gates at random
2. Finding functionally equivalent to $C^{out}$ but different circuit $C^{in}$ with $\ell^{in}$ gates.
3. Replacing $C^{out}$ with $C^{in}$ without affecting functionality of the original circuit.
##### Step 1: Selecting $C^{out}$
Observe that naive selection of $\ell^{out}$ consecutive gates in $C$ and replacing them with $C^{in}$ can easily be reveresed if most gates in $C^{out}$ are commutitative (i.e. swappable). It's because with high probability $C^{in}$ will be some permutation with gates present in $C^{out}$ with pair of random base gate and their inverses to pad up to $\ell^{in}$ gates. For example, $\gamma_1, ..., \gamma_{\ell^{out}}, \beta, \beta^{-1}$ where $\gamma_1, ..., \gamma_{out}$ are $C^{out}$ gates and $\gamma_1, ..., \gamma_{out}, \beta, \beta^{-1}$ are $C^{in}$ gates.
Skeleton graph:
In order to avoid the obvious attack, the reversible circuit is first converted to skeleton graph which only stores dependency among the gates. In the skeleton graph each gate is a node and there exist an edge between two gates $i \lt j$ iff gate $i$ collides with gate $j$ and there exists no such gate $k$, $i \lt k \lt j$, that collides with both gate $i$ and gate $j$.
Selecting $C^{out}$ at random:
Observe that a convex subgraph with $\ell^{out}$ nodes in the skeleton graph is locally replaceable. That is, the convex subgraph can be swapped out with functionally equivalent circuit $C^{in}$ with $\ell^{in}$ gates.
Selection of random $C^{out}$ simply requires selecting a convex subgraph with $\ell^{out}$ nodes at random.
##### Step 2: Finding $C^{in}$
Simple brute force approach to find a functionally equivalent to C^out circuit $C^{in}$ with $\ell^{in}$ gates is feasible as long as $\ell^{in}$ is small enough. In our implementation, $\ell^{in}$ is fixed to 4.
##### Step 3: Replacing $C^{out}$ with $C^{in}$
Since $C^{out}$ is a convex subgraph the predecessors of $C^{out}$ and successors of $C^{out}$ are disjoint. At a high level, replacement operation writes the circuit such that gates of $C^{out}$ are placed consecutively (i.e. predecessors of $C^{out}$ are placed before $C^{out}$ and successors of $C^{out}$ are placed after $C^{out}$). It then replaces $C^{out}$ with $C^{in}$ and updates the skeleton graph with necessary edges.
To correctly update the skeleton graph efficiently two additional auxilary data structures are required: direct incoming connections (DIC) and direct outgoing connections (DOC).
DIC and DOC are maps that map gates to a set. For any gate $j$, DIC[j] maps to a set of gates that are $\lt j$ and collide with $j$. For any gate $i$, DOC[i] maps to a set of gates that are $\gt i$ and collide with $i$. Note that any two gates $i$ and $j$, for $i < j$ have an edge in skeleton graph iff they collide and DOC[i] and DIC[j] are disjoint.
Following is pseudocode of the replacement algorithm:
```python
def inplace_topological_sort(nodes: &mut [Nodes]):
'''
Topologically sort the nodes in-place as per SG(C)
'''
def find_all_predecessors(convex_set: [Nodes]):
'''
Finds union of predecessors of the nodes in the input convex_set
'''
def find_all_successors(convex_set: [Nodes]):
'''
Finds union of successors of the nodes in the input convex_set
'''
def replace_cout_with_cin(C^out, C^in, DOC, DIC):
'''
Replaces C^out with C^in
'''
# Since C^out is convex `all_predecessors` and `all_successors` must be disjoint.
all_predecessors = find_all_predecessors(C^out)
all_successors = find_all_successors(C^out)
# Find the nodes that are neither predecessor nor successors of C^out
all_outsiders = all_nodes(SG(C))
inplace_topological_sort(all_predecessors)
inplace_topological_sort(all_successors)
# Update DOC and DIC with C^in. And update SG(C) with new edges.
for gate i in C^in:
# Direct outgoing connections within C^in
for j in C^in[i+1, ]:
if i collides with j:
insert j in DOC[i]
insert i in DOC[j]
# Direct outgoing connections from C^in to C^out's successors
for gate j in all_successors:
if i collides with j:
insert j in DOC[i]
insert i in DOC[j]
# Treat all outsiders as successors
for gate j in all_outsiders:
if i collides with j:
insert j in DOC[i]
insert i in DOC[j]
for gate i in C^in:
reverse(for gate j in all_predecessors):
if i collides with j:
insert i in DOC[j]
insert j in DIC[i]
# Remove unecessary edges in SG(C).
# That is, remove any edge from j to any node in DOC[i]
# This is because *now* there exist a node between j and i's
# direct connection which collides with both, the gate i
# itself.
for node in intersection(DOC[j], DOC[i]):
SG(C).remove_edge_if_exists(j, node)
# Add an edge in SG(C) following the rules of skeleton graph.
# That is, iff DOC[i] and DIC[j] are disjoint
if is_disjoint(DOC[j], DIC[i]):
SG(C).add_new_edge(j, i)
# Find union of DOCs/DICs of C^out (excluding any vertex in C^out)
let union_cout_docs = {}
let union_cout_dics = {}
# Delete vertices in C^out from DOCs, DICs, and SG(C)
for node in C^out:
delete from SG(C)
delete from DOC including from the internal sets
delete from DIC including from the internal sets
# Re-connect missing links
for direct_inc_predecessor in union_cin_dics:
for direct_out_successor in union_cout_dics:
if is_disjoint(DOC[direct_inc_predecessor], DIC[direct_out_successor]):
SG(C).add_new_edge(direct_inc_predecessor, direct_out_successor)
```
#### RIO strategy 1 and strategy 2
Strategy 1 is simpler than strategy 2. However authors claim both strategies are equivalent. Strategy 2 is same as the one we've seen before; inflationary stage followed by kneading stage. In the inflationary stage $\ell^{out} < ell^{in}$. In the kneading stage $\ell^{out} = \ell^{in} = \ell^{knd}$ and $\ell^{knd} \geq \ell^{in}$ of inflationary stage.
Strategy 1 does not differentiates between inflationary and kneading stage. Instead at each local mixing step it randomly samples $\ell^{out}$ from the range $[2, \ell^{in}]$.
#### Concrete parameters for ROI
For strategy 2 authors set $m^* = n log^4{n}$ and $m^{\#} = n^3$. They run inflationary stage for $n^4$ iterations, and kneading stage for $n^5$ iterations.
However, for strategy 1 authors have drastically cut down iterations to $O(m \kappa)$ where $\kappa \approx n$. For example, for n = 128 only $\approx 460K$ iterations are required for security parameter for $\kappa = 128$.
## Additional resources
### The t-wise Independence of Substitution-Permutation Networks (https://eprint.iacr.org/2021/507.pdf)
- t=2, that is almost pair-wise independence, implies security against any attack that takes advantage of statistical distance between pair of outputs (for ex, linear/differenctial/truncated-differential cryptanalysis).
- There exist a class of diffrential attacks called higher order differentials that consider the k-th derivative (Lai94). Note that k-independence permutation is secure against k^{th} differential attack. More so, any pairwise (t=2) independce is secure against truncated differentials (Knu94)
- There are 3 main results:
- Theorem 3.7 & 3.8: a 3 round SPN is $\epsilon$ close to 2-wise independent where $\epsilon$ is small
- AES is almost 2-wise independent. To be specific they prove that 6 round AES on 8-bit blocks is $\epsilon=0.472$ close to pairwise independence and then use MPR amplification lemma to claim $6r$-round AES is $2^{r-1}0.472^r$ close to pairwise independence.
- An existential result that says that there exists t+1 round key alternating cipher (independent keys per round; heuristic assumption of key-scheduling is ok!) so that it is almost t-wise independent. Existential means they don't concretely define the permutations.
- Video: https://ntt-research.com/cis-stefano-tessaro-2021summit-summary/
- Another related video on t-wise independence, permutations, and block ciphers: https://www.youtube.com/watch?v=FPWPQ6L8Thk
### Pseudorandom Permutations from Random Reversible Circuits (https://arxiv.org/pdf/2404.14648)
- Depth of reversible circuit: Maximum no. of gates through which any wire passes. A gate is defined as permutation on c << n wires, where n is total no. of wires.
- Authors claim that all difficulty in theorem 9 (their main theorem) comes from "analysing the spectral gap of the natural random walk on k-tuples of strings"
- What is spectral gap of natural random walk?
- What is LP21 and how does OWF connect with Kolmogorov's complexity?
- Check papers on analysis of security properties of SPNs
- HMMR05 conjecture that any exp(-n)-approximate 4-wise independence permutation composed of reversible gates forms a pseudorandom permutation.
- HOw does statistical indistinguishbility compares to computational indistinguishibility ?
-> **Statistical**: distinguish two statistical distributions, requires as many samples than the security parameter to be able to distinguish, for example distinguish $x\in_{u}\mathbb{Z}_{2^{128}}$ from $x'\in_{u}\mathbb{Z}_{2^{128} - 1}$. Provides security against an adversary with unbounded computation.
-> **Computational**: only one sample is needed but as many operations as the security parameter are needed to be able to distinguish from the ideal distribution, for example breaking the LWE problem. Does not provide security against an adversary with unbounded computation.
- Connection between OWFs, reversible circuits, and hardness of inverting block ciphers
- Authors view problem of inverting block ciphers as finding the reversilbe circuit that computes the permutation. They call it minimum reversible circuit size problem (MRCSP), which is analogous to mininum circuit size problem (MCSP) (in KC00).
- KC00 prove that whenever the input output pairs fed to MCSP solver are computed using a pseudo-random function then MCSP solver does not run in polynomial time. Hence, exitence of OWFs precude the existence of polynomial time solver for MCSP.
- There's a theorem that says that if OWFs exist, then for a constant d, and for large enough n there exist a negl(n)pseudo-random function with boolean with gates at-most n^d.
- Authors show that we can re-write feistal network (a pseudo-random function) with n^d gate as a reversible circuit. Their contribution is to apply the permutation again on the ancilla bits so that they don't leak information about the input.
- Analogous to MCSP, a reversible circuit that computes a pseudo-random permutation with n^d gate is MRCSP hard. Hence, if OWFs exists then there exists negl(n)-pseudo random function with boolean circuit of size n^d. Thus, there exists a reversible circuit with n^d gates that computes negl(n)-pseudo random function.
- In CMR20 authors build a pseudo-random function with $2\log{n}+\log_3{n}$ gates. Based on conjecture in HMMR05 setting k = 4 with $\epsilon = 2^{-n}$ gives a pseudo-random function, of which the authors provide more evidence. Then as per theorem 9 a random brickwork reversible circuit of size $O(n^2)$ suffices to build pseudo-random function. I'm curious whether there's any to be drawnbetween theorem 9 and 3-stage construction of CMR20. In particular, CMR20 provides quantum mechanical evidence for pseudo-randmoness of 3-stage construction but no proof. Do you think there can exist a proof for pseudo-randomness of reversible circuits with strctured gates with depth $\log{n}$?
### Quantum statistical mechanics of encryption: reaching the speed limit of classical block ciphers (https://arxiv.org/abs/2011.06546)
The paper formulates attacks on fiestal block ciphers using out-of-order correlators (OTOCs) and entropy. Decay of OTOCs signal chaos in quantum systems and saturation of entropy limits information that can be extracted by an adversary. They show that for insecure instantiation of fiestal block cipher with 1, 2, or 3 rounds, OTOC does not vanishes to very small values and has significant residual entropy (i.e. distance from maximum entropy). However with 4 rounds OTOCs vanishes exponentially while entropy saturates.
Based on the above evidence they construct a 3-stage reversible circuit with $n\log{n}$ 3-bit permutation gates for which OTOCs vanish exponentially and entropy saturates. Hence, the 3-stage cipher can be claimed to a strong pseudo-random permutation secure against an adversary with polynomial no. of queries (they further claim that reversible circuits are also secure against quantum adeversaries).
The 3-stage construction consists of 1 inflationary stage with $\log{n}$ layers, followed by 1 proliferation stage with $\log{n}$ layers, and another inflationary stage with $\log{n}$ layers. Inflationary stage only consists of gates from the set of 144 inflationary gates. Proliferation stage only consits of gates from the set of 10K proliferatory gates (however one of the authors mentioned that any random 3-bit permutation gate would do as a proliferatory gate in [this video](https://www.youtube.com/watch?v=wmdtbYGkhno)).
Let $n'$ be power of 3 $\geq n$. For each layer, divide $n'$ into sets of triplets such that collection of layers within a stage forms hierarchical tree structure. For example, l1 = (0, 1, 2) (3, 4, 5) (5, 6, 7)... and l2 = (0, 3, 6) (1, 4, 7) (2, 5, 6)... etc. Assign a random inflationary/prolifatory gate to each triplet. Let $\pi$ be a random surjective map from $n'$ to $n$. Apply $\pi$ to elements of each triplets in every layer (if n' = n, then map is 1 to 1. If n' > n then repetitions are inevitable) and output the reversible circuit for the stage. Repeat the procedure for all 3 stages and output the concatenation of the 3 reversible circuits.
**Extra:**
- Looks like US gov people are interested in this - https://www.sandia.gov/app/uploads/sites/210/2022/06/PETWG-May22-v2SAND.pdf
### Synthesis of Reversible Logic Circuits (https://arxiv.org/pdf/quant-ph/0207001)
- The paper has several proofs of theorems on reversible circuits that are taken for granted else hwere in literature.
- CNT library consists of following gates: C-NOT, Toffoli, Not. A reversible circuit is called CNT-circuit is it consists only of gates available in CNT library. A permutation is called CNT-constructible if there exists a CNT-cicuit which computes the permutation.
- (Theorem 12) Every even permutation (ie $A_{2^{n}}$, set of all even permutations) on n-wires as CNT-constructible (ie can be computed by a reversible circuit composing only of gates from CNT library).
- Corollary 13: Every permutation, even or odd, over n-wire may be computed with at-most one wire of temporary storage (ie borrowed ancilla wire)
- Corollary 34: Worst case CNT-circuits on n wires require $\Omega(n2^n/\log{n})$ gates.
### Transformation rules for CNOT https://www.cecs.uci.edu/~papers/compendium94-03/papers/2002/dac02/pdffiles/28_4.pdf
- CNOTs refer to general n+1 Toffoli, CCNOTs
- Paper gives 6 transformation rules for CNOTs. For example rules to transform a circuit with 2 CNOTs into equivalent circuit with 3 CNOTs or vice versa. The goal of the paper is to produce optimal reversible circuits using CNOTs, opposite to ours. But the transformation rules are helpful for finding functionally equivalent circuits with more CNOTs.
---
<!--
Reversible circuits have this nice property that composition of m gates from the set B_n on n wires, where m is large enough, gives a strong pseudo-random permutation. There's a line of work that has made the bounds tighther overtime and have recollected them here. But, at the moment, the current best is nlogn where n is the number of wires.
**How does t-wise independence relate to gower's conjecture ? Stress that t-wise independence is no proof of being an SPRP. And there's a difference here**
Now let m* be such that the reversible circuit is SPRP (ie assumption 4 holds). We can further show that any small segment with m* gates of a sufficiently long random reversible circuit C_{n, m} with m > m* gates is indistinguishable from a random circuit with m* gates.
More formally, define C_{n, m} where m > m*. For a random i, define the m*-gate starting from gate i as C_{n, i}. Given a challenge circuit C' and oracle access to C_{n, m}, adversary cannot determine whether C' = C_{n, i} or C' = C'', where C'' is random circuit with m* gates, bettwe than random guess. This directly follows from gower's conjecture and following is the proof: TODO
Let's take a step further. Let's fix the functionlaity of the long reversivle circuit. Define a circuit Q with m' gates. Let P_{Q} be funtionality of Q. Sample long reversible circuit $C_{n, m} \leftarrow E_{P_{Q}, m}$ where m \gte m'm*. Then authors claim that any sub-circuit $G_{n, [1, \ell]}$ of C with $m* \leq \ell \leq (m' - 1)m*$ is an SPRP. This is assumption 5 (Prefixes of random circuits with fixed functionality is an SPRP).
It is important to note that assumption 5 is only plausible and does not directly follows from assumption 4. The reason for this is circuit C is a random circuit that computes a "fixed permutation" P_Q. Hence sub-circuit(s) G is statistircally far from any random circuit with \ell gates.
**Question: Why is G statistically far from a random circuit with \ell gates.**
**Question: Another important thing to note is $m^*$ needs to be big not only to statisfy assumption 4, but the set E to be big enough such that two different circuits that compute same permutation P are not statistically corretaltable. Authors mention in footnote 4 that it is the case whenever $m^*$ >= |B|**
Claim 6 directly follows from assumption 5. It says if adversary is given
(1) oracle access to C_1 and C_2 where C = C_1|C_2 is a m gate circuit which computes Q_{n, m'} and C_1 has \ell gates where \ell and other variables of assumption 5. Then C_1 and C_2 are indistinguishable from two SPRPS that jointly compute Q_{n, m'}. Check paper for proof.
Assumption 7, unlike claim 6, considers the case where the adeversary has physical access to the circuits instead of oracle access. It says that there exist m^# for which following holds for any SPRP random reversible circuit C_{n, m^*}:
$$\hat{C_1}|\hat{C_2} = \hat{C}$$
where \hat{C_1} and \hat{C_2} are sampled from E_{P_1 | C, \ell_1} and E_{P_1 | C, \ell_2} respectively where P_1 has m_1 gates, P_2 has m_2 gates, P_(P_1|P_2) = P_{Q_{n, m'}}, and Q_{n, m'} is random circuit with m' gates. \ell_1 \gte m_1m^# and \ell_2 \gte m_2m^#.
and where \hat_{C} is sampled from E_{P_1|P_2, \ell_1 + \ell_2}
It is important to note that both assumption 6 and assumption 7 are independent of assumption 4. That is, 6&7 do not follow from gower's conjecture. Authors have only called 6&7 to plausibly follow from gower's conjecture and the assumption can be based on something un-related to gower's conjecture.
Assumption 8 simply says that (1) there exist n^*, m^* that satisfy assumption 4, (2) and ther exist m^# that satisfies assumption 7 with respect to SPRP in assumption 4.
Assumption 9 is stornger variant of assumption 8 where m^* = m^#.
**TODO:** Why the case m^*=m^# is a stronger variant?
**ROI and RIO**
ROI
An obfuscator O is ROI with post-porcessing function \pi: C_{n} \rightarrow C_{n} and inner strech \xi: N^3 -> N if for all circuits with n-wires (ie circuit ensemble with n-wire circuits) following is true
$$\{ O(C) \approx_c \pi(C') : C' \leftarrow E_{C, \xi(\kappa, n, |C|)}\}$$
**The first thing that comes to mind is why there exists a post-processing function \pi? This is because ideal obfuscators imply coNP = NP; Explain it here btw**
There are two things to infer from the definition of ROI
1. ROI conincides with iO when O = \pi.
2. For the case where \xi(\kappa, n, m) - m \gte \rho(\kappa), ROI is more powerful than iO because of the set size E_{C, \xi(\kappa, n, m)} is exponential in \rho(\kappa) (claim 2).
RIO
Security of Random input and output obfuscator requires indistinguishability to hold for circuits drawn from certain specific distribution (**what is this dist in pactice?**) when the adeversary is given only the obfuscated circuit plus some limited infomration about the original circuit. This is in contract to iO where adeversary is given both the obfuscated circuit and orginal circuit.
Let's first define Random Input (RI) obfuscators. An obfuscator O with input distribution $\textbf{C}_{n, 2m}$, and output distribution D is RI if:
1. Property 1: $$(C_1, C_2) \approx_{c} O(C_1', C_2')$$
where
$$C \leftarrow_R \textbf{C}_{n, 2m}; C_1 \leftarrow O(C); C_2 \leftarrow O(C)$$
and
$$ C \leftarrow \textbf{C}_{n, 2m}; C' \leftarrow D, C' \leftarrow D$$
Property 1 means requires two obfuscated circuits of same input circuit should not look too much like. Hence, they must be indistringuishable from functionally equivalent random circuits sampled from output distribution D.
2. Property 2: Properoty 2 requires the obfuscated circuit to not be much correlateabl with the original circuit even when given oracle access to half-way functionality of the original circuit. For random inpout circuits $Z_1 \in \textbf{C}_{n, m_1}$ and Z_2 \in \textbf{C}_{n, m_2}, It says:
$$O(C), \hat{C_1}, \hat{C_2} \approx_c C', \hat{C_1}, \hat{C_2} $$
where
$$O(C) \leftarrow \textbf{C}_{n, m^*}, \hat{C_1} \leftarrow E_{P_{C_{[1, m]}|Z_1}, \lambda m_1}, \hat{C_2} \leftarrow E_{P_{Z_2|C_{[m, *]}}, \lambda m_2}$$
and
**TODO** what is lambda? ANd why \lmanda < m^#. Also note that \hat{C_1/2} because of long leeway are expected to erase information.
Observe that
1. \hat{C_1} and \hat{C_2} contain information about m-gate prefix of circuit C and suffix of the circuit C respectively.
2. $$\hat{C_1}^{\dagger} | O(C) | \hat{C_2}^{\dagger} = Z_1^{\dagger} | Z_2^{\dagger} = Z^{\dagger}$$.
Any RI obfuscator O with inner stretch $\xi: N^3 \to N$, with output distribution D, and post processing \pi: C_{n, \xi(\kappa, n, m)} \to C is an RIO obfuscator if D = \pi.
**Theorem 16, etc.**
Thoeream 16 says that if there are following three things then there exist ROI for all reversible circuits:
1. A right-sepaarable ROI for input circuit ensemble ${C_{n^*, m^*}}$ and post-processing \pi_1
2. A left-sepaarable ROI for input circuit ensemble ${C_{n^*, m^*}}$ and post-processing \pi_2
3. A RIO that satisfies properoty 2 with input ensemble $\pi_1(C_1) | \pi_2(C_2)$
Theorem 16 is proven using 3 different constructions: Random identity generator, Gate obfuscater, and Soldering obfuscator.
Random identity generator is ROI obfuscator for random circuits that compute identity function I_n. It is constructed using RIO that satisfies proerpty 1 for input enseble C_{n^*, m^*} with inner stretch \xi(\kappa, n^*, m^*) = m where m \gte m^{\#}. The construction of RIG is straightforward:
1. $C \leftarrow C_{n^8, m^*}$
2. $C_1 = O(C); C_2 = O(C)$
3. Output $C_1 | C_2^{\dagger}$
Using assumption 4 and 7, $C_1 | C_2^{\dagger}$ is shown indistinguishable from $\hat{I}_{n^{*}, [1, m]} | \hat{I}_{n^{*}, [m, *]}$ where $\hat{I}_{n^{*}, 2m} \leftarrow E_{I^{n}, 2m}$. That is, the output circuit of RIG is indistinguishbale from a random circuit with 2m gates which computes identity permutation. Hence, RIG is an ROI with inner stretch 2m and post-processing \pi' = (\pi, \pi^{\dagger})
One important thing to note is sampling reversible circuits with 2m gates for identity permutation (or any other permutation) cannot be done in poylnomial time. With assumption 7&4 we're able to sample an identity permutation with 2m gates and prove it indistinguishable from any circuit which computes identity permutation with 2m gates.
Gate obfuscator is ROI for a single gate. Following is algorithm to obfuscated gate \beta
1. Sample RIG until first gate of output circuit C is \beta.
2. Replace first gate \beta with base identity gate \beta_I and output
For GO to be efficient, RIG must output a circuit with gate \beta with probability polynoimial over \kappa (ie with relatively high probability). Security of GO is proven using security of RIG with inner strech same as RIG and modified post-processing $\pi =(\pi_1', \pi_2)$ where $\pi_1'(C) = \beta^I|\pi_1(\beta|C)$ where post-processing RIG equals $\pi_1, \pi_2$.
GO remains efficient because there are only 16 base gates. Whenever RIG outputs circuit C with first gate as original gate $\beta$, albeit different wires, swap the control and target wires of first gate in C with respective wires of original gate. This way RIG needs to sampled atmost 16 times.
Append and Solder obfuscator is ROI that takes two obfuscated circuits as inputs and outputs a obfuscated circuit that is concatenation of input circuits. Let n^*, m^*, m^\# satisfy assumption 8 and $m_{\kappa} \geq max(m^{*}, m^{\#})$, then AS requires:
1. An $m_{\kappa}$-right-separable ROI $RO_1$ with input ensemble ${C_{n^*, m}}$, inner stretch $\xi_1(\kappa, n^*, m) = m_{\kappa}m$, and post-processing $\pi_{1} = (\pi_{1, 1}, \pi_{1, 2})$
2. An $m_{\kappa}$-left-separable ROI $RO_2$ with input ensemble {C_{n^*, m}}, inner stretch \xi(\kappa, n^*, m) = m_{\kappa}m, and post-processing $\pi_{2} = (\pi_{2, 1}, \pi_{2, 2})$
3. An RIO, O, that satisfies property 2 with leeway $\lambda \leq m_{\kappa}$ and inner stretch $\xi_3(\kappa, n^{*}, 2m_{k})$ for the followig input ensemble:
$$(\pi_{1, 2}(C'_{1, 2}), \pi_{2, 1}(C'_{2, 1})): C_{1, 2}, C_{2, 1} \leftarrow \textbf{C}_{n^{*}, m^{*}}; C'_{1, 2} \leftarrow E_{C_{1, 2}, m_k}; C'_{2, 1} \leftarrow E_{C_{2, 1}, m_k}$$
AS ROI is constructed as the following:
1. $C'_{1} = RO_1(C_1); C'_2 = RO_2(C_2)$
2. Let $C'_{1, 1} = C_{1, [1, m_{\kappa}]}, C'_{1, 2} = C_{1, [-m_{\kappa}, *]}$, and $C'_{2, 1} = C_{2, [1, m_{\kappa}]}, C'_{2, 2} = C_{2, [-m_{\kappa}, *]}$
3. $G = O(C'_{1, 2} | C'_{2, 1})$
4. Output $C'_{1, 1}|G|C'_{2, 2}$
Hence, AS ROI has inner stretch: $\xi(\kappa, n^{*}, 2m) = 2m_{\kappa}m - 2m_{k} + \xi_3(\kappa, n^{*}, 2m_{\kappa})$
The proof of AS is pretty long and uses assumpption 8 and security properties of RO_1, RO_2, and RIO O.
**Questons**:
1. What remains to be seen is whether GO is m_{\kappa}-separable ROI. In particular whether m_{kappa} prefix of GO is CC m_{kappa}/2.
2. I'm confused about GO being a ROI with inner stretch m^{\#} but AS requires RO_{1/2} to have inner stretch m_{\kappa}m where $m_{\kappa} = MAX(m^{\#}, m^*)$. As per this, GO cannot be RO_{1/2}.
**Building ROI for all reversible circuits with GO and AS**
Recall that m_{\kappa}-separeable GO has inner stretch function $\xi_{GO}(\kappa, n^*, 1) = 2m_{\kappa}$ for $m_{\kappa} \geq m^{\#}$ and post-processing $\pi = (\pi_1, \pi_2)$. Hence, it satisfies the requirement of $RO_1$ and $RO_2$ of AS. Let's assume there exist RIO O' with inner stretch $\xi_{O'}(\kappa, n^{*}, 2m_{\kappa})$ that satisfies RIO requirements of AS. That is, O has input ensemble $(\pi_1(C'_{1, 2}) | \pi_1(C'_{2, 1}))$.
We can define ROI O for arbitrary length reversible circuits in circuit ensemble $C_{n^{*}, }$ as:
1. Let input circuit C be on n^{*} wires with m gates with the following gates: \gamma_0, \gamma_1, ..., \gamma_{m-1}.
2. Obfuscated each \gamma_i using GO. $$\Gamma_i = GO(\gamma_i)$$ Note that \Gamma_i has 2m_{\kappa} gates.
3. Let C' = \Gamma_0. Starting from \Gamma_1...\Gamma_{m-1}, mutate
$$C' = C'_{[1, -m_{\kappa}]} | O'(C'_{[-m_{\kappa}, *]} | \Gamma_{i, [1, m_{\kappa}]}) | \Gamma_{i, [m_{\kappa}], *}$$
4. Output C'
Observe that inner stretch of O is $2m_{\kappa} + x(m-1)$ where $x = \xi_{O'}(\kappa, n^{*}, 2m_{\kappa})$. Thus, it is linear in no. of gates in input circuit C.
---
Questions:
1. I'm failing to understand the input ensemeble of ROI which satisfies property 2 and is requried for soldering. I suspect the input ensemble is $C^1_{[-m, *]}|C^2_{[1, m]}$ for some m where $C^1 = O(C_1)$ and $C^2 = O(C_2)$ and O is ROI obfuscator which satisfies property 1. in words, the input ensemble is concatenation of second half of obfuscation of circuit $C_1$ and first half of obfuscation of circuit $C_2$. Let's assume m is half of m' where m' are no. of gates in output distribution of O. This means, in practice, the input esenble will have circuits with n^4 gates. n^4 is big and makes me think whether stronger variant assumption 8 can be used. That is set $m^* = m^{\#}$.
-----
**Rough**
It claims that following oracle access to following tow statements a two distribtuions are indistinguishable
(1) Orac
(2)
Another important thing to note is m* needs to be big not only to statisfy assumption 4, but the set E to be big enough such that two different circuits that compute same permutation P are not statistically corretaltable.
Randomly sample a really long reversibel circuit C with m \gte m'm* gates. The authors claim that any sub-circuit
It follows that inditinguishility still stand if C_{n, m} were a random reversible circuit that computes some fixed functionality. For example, if $C_{n, m} \leftarrow E_{I, 2m*}$ then C_{n, [1, m*]}, although is statstically far frotm a random reversible circuit (because sub-circuit of a circuit that computes a fixed functionalitu: identity permutation), it stands indistinguishable from a random circuit with m* gates.
Now taking a step further, authors claim that following is plausibly true.
\hat{C} = \hat{C}
that is, a random reversible circuit with 2m* gates with fixed permutation,ie identity function, is indistnguishable from a circuit which is concatenation of random reversible circuit C with inverse of another reversible circuti C' that computes same permutation as C. The reason this is plausibly true and does not follows from gower's conjecture is: both circuits compute the same fixed permutation, hence are correlatable with the permutation.
What's wonderful about reveersible circuits is that there exist long enough m# >> m* of which if you're given a short segment of size m* you cannot distinguish it from a random reversibel circuit with m* gates. This easily follows from gower's conjecture and a proof is given as footnote 3 pn page 4. (TODO: elaborate on this; prove it here)
---
**Implementation rough**
1. Sticking to a single control function is relative fast because space of total circuits is reduced.
2. What are the algorithms to find reversible circuit with fixed funtionality?
3. It takes more time when ell_in = 10 compared with when ell_in = 5. Is it because for higher values of ell_in borrowed ancilla wires become necessary?
TUrns out with more gates it seems necessary to increase many way gates. THis reduces the search space. BUt is the case that these are the only ways?
**Finding convex subgraph**
Finding convex subgraph in general is a hard problem. I'm thinking to taking the following approach.
1. Use some topological sort algorithm to sort nodes in level in binary tree. The nodes in layer >= n+1 may depend (ie may have incoming edge) from node in layer n but not vice versa.
- Reason to do this is because it prunes the DFS to check convexity of subset. For example, if we want to check whether the set convex subeset {A} still remains convex after adding B. We perform a DFS starting from A. The moment DFS comes across a node with level > level of B, we can safely terminate it.
2. To find a random convex subgraph.
- Take a random node and add it to the convex subgraph set.
- Greedily add new nodes to the set in BFS manner.
- Greedily here means that when adding A to the set. If adding A causesthe subset to become not-convex, add elements required to make the subgraph convex assuming A is in the subgraph. Abandon A when the subgraph cannot be made convex with less than T nodes.
- Start with next node in BFS; Again greedily try to add the node.
How to check convxity of set after adding a new node?
- Let L be the level of new node.
- For all the nodes in the set with higher level value start a DFS from new node until one of the nodes in the set is encountered or DFS can be safely terminated (DFS can be safely terminated if it ecnouters a node with level MAX(S), where is set of nodes on convex set). If encountered and the visted nodes are not in the set, then the subset is not conex.
Constructing random convex subgraph
Let v be a random node. Let unexplored_candidates be set of outgoing edges from v. We randomly add one of the nodes in uneplored_cnadidates to the subgraph and check whether the subgraph still remains a convex subgraph.
To check for convexity we iterate through all the nodes in the subgraph and check whether any node exists s.t. they are not in subgraph but are on some path from node in subgraph to the candidate node.
We take a greedy strategy. We first find all nodes that exist on any path from any node in the subgraph to the candidate node. Let's call this set to_add set. We check wheteher |to_add U subgrph| < T. If it is, then we add all the nodes in set to_add to the subgraph and update unexplored_candiadates with outgoing edges of newly added nodes. If not, then we mark the candidate as explored. If |subgraph| = T, return. If not, pick a new candidate from unexplored and repeat.
Above we assumed that the set (to_add U subgraph) is a convex graph. This is based on observation that if a node K exists on a path from A to U and to_add is set of all nodes in any path from A to U, then all nodes in any path from A to K are also in to_add since a path from A to K is sub-path of a path from A to U. Thus, (to_add U subgraph) is a convex graph.
-->
**13oct**
- Sampling a random subgraph with ell gates and finding a functionally equivalent reversible circuits with same no. of gates tends to get really hard as ell grows. It becomes in-feasible beyond ell=5. I suspect the reason is the following: A random subgraph has complexity gap 0. Which means the set of circuits that compute the same function as the random subgraph with same no. of gates are exponentially small with m.
- Note that this does not includes the case where we know random subgraph has CG > 0. In which case, assuming ell_in is sufficiently larger than ell_out, finding funtionally equivalent circuit is quite fast.
- The observation above makes me wonder whether the kneading stage work. At the moment, I think it will. This is because inflationary stage intentionally introduces complexity redudancy in the circuit. Thus a random subgraph will have complexity gap way higher than 0 (Question what will be the complexity gap?). This means a subgraph with ell_knd (ell_knd is way higher than ell_in,ell_out) will have CG > 0. Hence, it should be easier to find a functionally equivalent circuit ell_knd gates. However, as kneading stage progresses CG is expected to reduce below CG for any random circuit with ell_knd gates. Therefore find replacement circuits may become harder.
- Question what will be the complexity gap of random subgraph with ell_knd gates after inflationary stage?
- My initial response was that it should be proportional to ell_out - ell_in.
- Is the size of set of circuits that compute permutation P with CG(P) = x exponential in x? In other words, does it become easier to find a circuit if CG is high.
- If it is then it should be easier to find replacements for kneading stage at the start. However with each iteration the difference between a CC of a random subgraph with ell_knd gates of the circuit and CC of random circuit with ell_knd gates reduces. Thus overtime finding a replacement for a random subgraph will become harder. I'm curious beyond what point do we consider it enough and terminate the process?
**15 oct**
- The time it takes to find a circuit with \ell^in gates seems to grow with \ell^out. It seems to mostly depend on no. of acive wires in C^out, which infact grows with \ell^out (with high probability it is \approx 2\ell^out + 1). At the moment, as far as I can see, it seem to be unfeasible to find repalcement circuit C^in when \omega^out >= 7. I've asked other folks whether the way I'm sampling the circuits is correct. Whether there exist a better strategy to sample circuits.
- It will be nice idea to get a average sampling runtime estimates for various values of \ell^out and \omega^out.
- I'm also curious to abou how does the average sampling runtime changes with probability of sampling 3-way gates and 2-way gates.
- I took a random circuit with \omega^out = 6, \ell^out = 5, and tried finding a functionally equivalent circuit with \ell^in = 10. I set two_prob to 1 (ie no 3-way gates and only 2 way gates). Even after 100M rounds of randomly sampling different circuits (there were repetition) I couldn't find a funtionally equivalent circuit. This makes me think that probably the brute force approach is not the way to go. We need a better strategy.
- Just to assure that setting two_prob = 1 wasn't the cuplrit I ran again with two_prob = 0.5 (i.e. probability of sampling 2-way and 3-way gate is same) for the same parameters. The only difference in this case is several circuits are sampled more than once and all of them have 1 thing in common - all of their gates are 3-way CNOTs. I guess it is because, although the probabity of sampling 3-way is equal to 2-way, the space of reversible circuits with just 3-way gates is way smaller because a single 3-way gate decomposes into 4 2-way gates. So for \ell^out = 10 one only needs composition of at-most 3 3-way gates.
- This makes me wonder whether setting equal probability for 3-way and 2-way gates is fair.
- I changed two_prob = 0.8 but nothing changes. Still many collisions because of 0.2 chances of 3-ways gates
**17 oct**
- I treid splitting a circuit with many gates into smaller circuits with less gates and find replacement circuits for each whenever n is big (for ex, n >= 7). Well we're able to find the circuit but it is exactly as the input circuit. Splitting the circuit into less gates reduces the search space and we're able to find exactly the same circuit. For exaple with \ell^out = 7 and n = 15. I split the circuit into first 3 gates (first circuit) and last 4 gates (second circuit). Then I tried finding their funcitonally equivalent counterparts. Splliting into smaller gates reduces the search space from n(n-1)(n-2)*7 to n(n-1)(n-2)*3 and n(n-1)(n-2)*4 respc. But this still does not help becauss we increase the probability of finding the same circuits
- Let's think through what's the point of the two stages - Inflationary and kneading stage.
- Inflationary stage: This stage introduces redudancy to the circuit. It sets c * \ell^out = \ell^in. More so, finding replacement circuit for this stage, as far as I understand, is pretty easy.
- Kneading stage: This is the more problematic stage. The point of this stage is hide the introduced redudancy in the inflationary stage. It takes \ell^knd gate sub-circuits and finds their \ell^knd gate replacement. Authors claim that doing so spreads the complexity gap over larger and larger region. Their reasoning goes as follows:
- Let C^out be the \ell^knd circuit. Note that C^out may have, assuming \ell^knd > \ell^in, several C^in from previous stage within itself. Hence CC(C^out) >= \sum CC(C^in_j) where C^in_j is j^th C^in circuit. With each iteration number of C^in within any C^out will decrease beacuse of the previous iteration, hence CG(C^out) will grdually decrease and after sufficiently many iterations will equivalent to CG of any random circuit with \ell^knd gates. Hence, there's no way for an attacker distinguish any part of circuit from a random sub-circuit with \ell^knd gates. There's no thread left to start pulling from.
- I have the following quyestions:
- If at each iteration we find a funtionally equivalent circuit which just has a different shape (ie contains no information about orginal set of C^in circuits) the CG still remains equivalent to CC(C^out) >= \sum CC(C^in_j) (C^out being the circuit being replaced). This means touching a particular sub-circuit, C^out, which is a collection of several C^in of inflationary stage just once isn't enough. Hence, we need to run iterations of kneading stage quite a few times.
**18 oct**
- I noticed for the case \ell^out = 2 (or even 3/4) and \ell^in = 5 (or honestly any other > \ell^out value) the C^in contains all gates of C^out. I do not know whether this is obvious. But I expect it to make it easier to attack if redundancy is not spread out properly in the kneading stage.
Let's assume the redundacny is not spread out in the kneading stage. Then with high probability there exist a convex subcircuit with \ell^in gates and \omega^out active wires (we assume that active wires of C^in and C^out are unchaged). To reverse the mixing, the attacker first identifies a sub-circuit with \ell^in gates and \omega^out active wires and tries to find a replacement circuit with \ell^out gates. If the attacker has no other information the attack will take in the worst case $O(2^{2^{\omega^{out}}})$ iterations. But if the attacker knows that C^in contains gates of C^out with high probability then attacker needs to only try \ell^in p \ell^out (ie no. of way to select \ell^out gates from \ell^in gates).
Below are a few samples of C^out and C^in.
```
C^out
---------------
0 1 2 3 4 5 6
---------------
x I x x O I x
x x I I x O x
I x x O x x I
---------------
C^in
---------------
0 1 2 3 4 5 6
---------------
x x I I x x O
x I x x O I x
x x I I x O x
x x I I x x O
I x x O x x I
---------------
```
```
C^out
---------------
0 1 2 3 4
---------------
x O I x I
I x x I O
---------------
C^in
---------------
0 1 2 3 4
---------------
x O I x I
x I x I O
I O x I x
x I x I O
I O x I x
---------------
```
Probably this attack can be avoided if \omega^in > \omega^out (ie C^in has more active wires than C^out). But I don't know whether this is necessary to do so. Or shall we safely assume that kneading stage will spread the complexity gap throughout really well.
**Oct 24**
- Aprt from finding functionally equivalent circuit C^in, the other thing that starts to consume majority of time for bigger graphs is "finding convex subgraph".
- The existing approach to find convex subgraph was naive and did a lot of extra work. More so, it iterated over all edges that all nodes in subgraph has, which can be many if the graph is really big.
- I trimmed down the runtime with the following simple rule: Recursively add elements to the convex subset via random selection of 1 edge belonging to 1 node in the convex set. If the subset is no more convex start afresh. If the resutling subset is convex and is not of desried size, recurse to find the next element. Don't get into the trap of checking each edge of each node.
- This rule seems to work but still takes long for anything over 40K gates. I'm wondering about why it increases with no. of gates. The primary candidate for this seems to be the DFS algorithm that checks whether the potential candidate node has any path with existing nodes in the convex set which includes an external node not in the set. The runtime of DFS does depends on the graph size but can easily be pruned with some auxiliary information given that we're generally interested in small selecting convex-subgraphs (ie subgraph with at-most 4 to 5 nodes).
- What can this auxilary information be?
- Can we solve this for simple case of 2?
- I refer to the funciton that implements the new way of finding the convex set (the one that does not falls into trap of edges; the recrusive way) as `blah`. I started to test blah for different graph sizes and there's clear difference in runtime (note that `blah` is called repeteadly with a new random source node until it finds a convex subset). For m=40K on average (although per call runtime depends on node; averaged over 1000 runs) it takes 90ms. For m=20K on average it takes 23.5ms. For m=10K it takes 6ms. For m=5K it takes 1.5ms
- While chatting with Han I figured that instead of trying to find a convex subgraph of some fixed size we can instead be more flexible with the size. For example, we're mostly interested in sizes 2, 3, 4, 5 and we should select the subgraph that is either of size 2, 3, 4, or 5. It remains to check whether the function returns the subgraphs of sizes 2, 3, 4, 5 with equal probability. If it does then this is equivalent to randomly sampling a size $\in [0, 5]$ and finding a subgraph.
- **Another question popping in my mind is how efficient would this approach be for proof verification + FHE decryption.** (Making this point bold to come to later: **TODO**)
**Oct 25**
- I wanted to assign levels to each node in the graph such that a predecessor is guaranteed to have level < its successors. I took the following approach:
- Topologically sort the graph and then for each node assign the level = max(all parents' levels) + 1.
- The approach works but to topoligically sort a graph with 50K gates it takes 10s. This makes me wonder whether there're parallel algorithms for topolical sorting.
- **Why assign levels to each node**: There are two reasons
- (a) Prune DFS in "find convex graph". That is, safely terminate DFS recursion if current node's level > than candidate's level.
- (b) Fast C^out replacement with C^in: Assume there exist a hash map G_k that stores gates touching wire $k$. Then to draw edges from/to gates in C^in we just iterate through gates in G_{\alpha} where gate i in C^in touches wire \alpha and check for collision. If the colliding gate has level > C^out then drawn an outgoing edge otherwise draw an incoming edge. Note that C^out's level is a range and the colliding gate may have equal level in which case randomly make them predecessor or successor.
- Ok, looks like topological sort will hard and we cannot use it. Assuming the graph is dense, sorting takes roughly n^2.
- I wonder would it be more performant, or at-least equivalent to what we've at the moment, to pick a random subgraph and check whether it's convex. I see no difference other than we're ruling out non-convex sets much faster than before. Another side benefit is we know the topological ordering of the graph in advance. Hence, need not to worry about finding order of convex set later.
**Oct 26**
- Topological sort of skeleton graph with 50K nodes on average takes 30ms but SC version of the same graph on average takes 10s. The only difference between skeleton graph and SC graph of a circuit is skeleton does not have any transitive collision edges. I suspect the reason for a large slowdown in SC is because SC is very very dense.
- I'm thinking to divide the replacement operation into the following steps
- All immPred, and any of their predecessors, must come before C^in. Similarily all immSucc, and any of their successors must come after C^in. Hencce,
- for each immPred, find the first gate in each dependency graph of C^in it collides with. Then recrusively balance out the collisions for immPred's predecessors. For example if immPred collides with i^th gate in k^th dependency chain of C^in, then recursively find collisions for immPred's predecessors for gates [0, i) of the dependency chain.
- for each immSucc, find the last gate in each dependency graph it collides with. Then recursively balance out the collisions for immSucc's successors. For example if immSUcc collides with i^th gate in k^th dependency chain of C^in. Then recursively find collisions for immSucc's successors for [i+1, ) of the dependency chain.
- Add all the new edges to the graph
- It is possible that a immPred was connected with a immSucc via C^out but may not be connected via C^in. This incorrectly implies immSucc does not depend on immPred. But infact any immPred must come before any of the successors of C^out. Likewise any immSucc must come after any of the predecessors of C^out. To settle this case we take the following approach:
- For each such immPred, immSucc pair: for each path outgrowing from immSucc find the first gate immPred collides with. Draw an edge from immPred to that gate. Let that gate be gate i on j^th path starting from immSucc. The chain of gates on j^th path uptill, and not including, j^th gate must now be checked for collisions against predecessors of immPred. This is similar to settling dependecy chains with predecessors.
- C^in contains new gates which bring new collisions with gates which is neither a predecessor of C^out nor a successor. These are the outsiders. Find the outsiders (to find outsiders, find all predecessors and successors of C^out and remove union of predecessors, successors, and C^out from set of all nodes) and check for collision. We make the choice of making any colliding outsider a predecessor (although doing so is not necessary).
**Oct 27**
- I implemented an algorithm to settle cases where some immPred, immSucc pair are no more connected via C^in. However it turns out as the graph grows this algorithm takes really long because there are many such pairs in the kneading stage which results in many (successor) paths to investigate, which in turns results in many such dependecy chains to settle. I also added an algorithm that prunes recursive calls if same dependency chain is checked against any predecessor node more than once. But it does not help much. I parallelised settlement, but there are still cases where the algorithm takes > 5s. Check this [branch](https://github.com/gausslabs/obfustopia/tree/final1) for code.
**Oct 28**
- I went back to the naive way. That is,
- Don't use skeleton graph. Instead use SC.
- To replace C^out with C^in find all predecessors/successors of C^out. For each predecessor draw in outgoing edge to any gate it collides with in C^in. For each successor draw in incoming edge to any gate it collides with.
- For each outsider, draw an edge to any colliding gate within C^in.
- Obfuscation is in process. Still at inflationary stage (which runs for 50K times). A few observations:
- Not a single call to local_mixing returned false with max convex interation se to 10K and max replacement iterations set to 1M. Which not a single call to find_convex_subcircuit and find_replacement returned false. This is not surprising at inflationary stage. It will be nice to check whether this holds true for kneading stage.
- Majority of time per iteration is spent on 2 things:
(1) find convex subcircuit: roughly 85%.
(2) find all predecessors: roughly 10-12%
**Question**: How does find convex subcircuit time decrease with no. of cores.
**Question**: Why "find all predecessors" is more expensive than "fina all successors"? The difference is quite a bit. The later is in double digit micro-seconds whereas the latter is in double digit milli-seconds. Or am I mistaken?
- Find replacement circuit: takes 500us; Not surprising given we're at still at inflationary stage.
**Oct 29**
- **Question**: Why "find all predecessors" is more expensive than "fina all successors"?
- **Answer**: It's because the distribution of convex sub-circuits is skewed towards the end of the graph. This means it is very often the case that a convex sub-circuit has many more predecessors than successors.
- **Question**: How does find convex subcircuit time decrease with no. of cores.
- Answer: It does not improves linearly with no. of cores. However, it does improve.
- Han implemented a way to assign levels to each node in the graph. Using which we were able to prune the DFS used in find convex sub-circuit which gave a huge improvement in find convex circuit's runtime. However, the function to assign levels itself takes a lot of time. After parallelising it across cores, it takes roughly 290ms on 32 cores.
- **Question**: I wonder whether we can adjust levels in-place after replcing C^out with C^in. So that we don't we need to touch the entire graph per iteration to update levels.
- I suspected that the reason convex subcircuit is often skewed towards the end of the graph is because it is easier to find them towards the end of the graph. However, this does not seem to be the case. I restricted the start nodes (start nodes are randomly picked to start finding convex subgraph) to first quarter of the graph and observed the runtime to find a convex subcircuit. The runtime did not change but "find convex circuit" function did often find sub-circuits that has approximately equal no. of predecessors and successors.
- I compared predecessors/successors distribution of convex sub-circuit and runtime to find convex sub-circuit for a random skeleton graph against a random SC graph, both with 50K gates. On average over 10 runs, find_convex often returns convex subcircuit with even distribution for skeleton graph and un-even distribution for SC graph (I wasn't expecting this). More so, as expected, find_convex returns in 12ms for skeleton graph and 950ms for SC graph.
- This hints we must use skeleton graph for all graph related operations and use SC, or some other representation, for replacement operations.
**Oct 30**
- Since the core of program obfuscation via local mixing is constructing an RIG, we will simply narrow down to testing RIG. RIG requires to obfuscate a random reversible circuit using RIO obfuscator twice and concatenate the first output with the inverse of the second. Thus we obtain a obfuscated reversible circuit that computes the identity permutation. They are two elements here (1) RIO obfuscator (2) reversible circuit that can be guaranteed to produce permutation that are indistinguishable from a true random permutation by a polynomial time adversary.
- Reversible circuit for strong-psudo random permutation: I referred to the following paper. It shows some evidence (quantum-mehcanical which I don't happen to grasp) that a reversible circuit with 3 stages - inflationary -> proliferation -> (again) inflationary where each stage is log_2{n} layers and is tree-structured is a secure block-cipher. Hence, produces pseudo random permutation on its inputs. Following is my explaination of how to construct it:
- Construction:
- Let $L_{I0, x}, L_{P, x}, L_{I1, x}$ for x = 0, 1, 2 equal $\log{n}$ where $n$ is number of wires and $I0$ is the first inflationary stage, P is the middle proliferation stage, and I1 is the second inflationary stage. L_{y,x} represent the no. of layers in stage y.
- For each stage, iterate through the layers for $i \in [0, L_{S, x})$ where S is the stage. At layer i randomly divide n wires into sets of size 3. Each set must be unique (I realise it may not always be possible, in which case repetition is allowed to fill the set. That is, make sure that union of all set touches all wires at-least once). Also pick a gate for each such set. If S is inflationary then the gate must be picked randomly from set of 144 inflationary gates. If S is proliferation then the gate can be a random 3 bit permutation.
- Output $L_{I0, x} | L_{P, x} | L_{I1, x}$
- Notes:
- In local mixing paper, although authors have referred to the construction above to claim that m can be as low as O(n) for concreteness they set m = O(n4log{n}). I believe it's because the authors are working with just 16 base permutations, not all 3-bit permutations.
**Oct 31**
- So the new algorithm finally seems to work. It hasn't crashed for long. The problem was in the routine function. routine is supposed to handle edges from predecessors of C^out to their direct connections which they don't happen to have in the incoming graph because both the predecessor and C^out collide with the direct connection. The rule is pretty simple: Any node will only have an edge to other colliding node, if there's no other node in between which collides with both. Routine function takes this rule and applies it recursively.
- It makes two observations (1) that any predecessor of C^out will not have a direct edge with its direct connection after replacement only if C^out had a direct connection as well (2) If A^P is predecessor A where A is predecessor C^out and set of direct connections of C^out is DC. If A manages to regain path to its direct connetions in DC. Then A^P only has to care about the left over elements in DC.
- So the routine function collects union of direct connections of C^out. And passes it to its predecessors. Each predecessors, for each node in the intersection of its direct connections and C^out's directions checks whether it has path from itself to the node. If not, it then draws a direct edge. Then the predecessors chucks out the nodes at intersection and passes it over to its predecessors.
- One simple optimisation is: If a node (ie direct/indirect predecessor of C^out) has been visited once in the process. It does not need to visited twice. This is because if it receives different set of left over direct collisions of C^out from its two different successors and one of them consist a direct connection of itself but the other does not. Then the direct connection is rendered a don't care because one of the node's (direct/indirect) successor has already created a path (the successor(s) that passed over the set that does not have the direct connection) to the direct connection.
- The benefit of using skeleton graph for graph algorithms in the new algorithm is apparent. For example, at inflationary mixing step 5000 "find convex sub-circuit" using old algorithm takes (ie using SC graph) 50ms whereas with the new algorithm it takes 3ms on average. Finding predecessors and successor has sped up a lot as well. On single core "find predecessor" with new algorithm takes 4ms on average whereas with old algorithm it takes roughly 4ms on >= 32 core machine.
- The slowest part of new algorithm still remains the "routine" function. Recall that routine function is called per predecessor and the runtime per predecessor is very non-uninform. As far as I'm able to parse the traces, whenever a mixing step takes >1s it is just 2-3 predecessor routine function calls that consume the majority of the time. It seems likely that it is the "find_path" function that takes the longest. And there are few predecessors that have big intersections which leads blowing up their runtime.
- Text with Han on parallelisation:
- The new algorithm has following main functions:
1. It first finds topologically sorted predecessors of C^out
2. It then finds topologically sorted successors of C^out.
3. It then finds topologically sorted outsiders of C^out.
4. It then updates direct connections and transitive connections from C^in nodes to successors of C^out. Any two nodes will have a direct connection if they collide. Any two nodes will have a transitive connection if they collide and do not have any node in between that collides with both. Additionally all outsiders that collide are treated as successors.
5. It then updates direct connections and transitive connections from predecessors to C^in nodes. Additionally it removes any transitive edge that a predecessor, which collides with node A in C^in, may have with node A's (newly discovered) direct connections (because now node A of C^in manages that transitive connection).
6. It then calls the routine function for each immediate predecessor of C^out. I will tell you reasoning behind routine function later.
I wanted to narrow down to scope of parallelisation.
Finding topologically sorted predecessors and successors can easily be parallelised with what we've at the moment. We simply find all predecessors (using parallelised DFS) and then sort them as per graph levels. The same logic applies for topologically sorted successors.
Point (4) and (5) are pretty trivial to parallelise as well. But they happen to be very cheap. We should prioritise parallelising other things before we tackle 4,5.
The new algorithm seems to work well btw. "Find convex sub-circuit" is roughly 3-4ms after 5000 inflationary local mixing steps whereas with the old version it is 50ms.
The main bottleneck still is calling the `routine` function per immediate predecessor. For few predecessors it takes 100s of milliseconds. So I don't think we should parallelise it per predecessor. Moreover, if we parallelise per predecessor we will be visiting all predecessors again and again which we don't need to. I also suspect the majority of the time is spent in the `has_path_connecting` function. Which makes me think we should instead parallelise `has_path_connecting` across a intersection set.
It could be the case that as the graph grows a single invocation to `has_path_connecting` may take a quite a bit of time. I don't know how `has_path_connecting` behaves with big graphs. I will test it now.
At the moment, I'm parallelising `has_path_connecting` across intersection set and checking whether it gives any significant speed ups.
If parallelising `has_path_connecting` across intersection set gives significant speeds ups and runtime of `has_path_connecting` does not degrade significantly on larger graphs on average. Then we're good to go. Otherwise, we will have to implement parallel variant of `has_path_connecting` (which should be easy given we already have parallel DFS).
Then we will have to figure out some heuristic for when to use parallel `has_path_connecting` per node in any intersection set vs to parallelise `has_path_connecting` across nodes in the intersection set. A naive heuristic could be: for small graphs always parallelise across the set. For big graphs if the intersection set is small use parallel variant of `has_path_connecting` otherwise parallelise across intersection set
**Nov 4**
- We modified the the main program to run two different type of obfsucation strategies. The first is a rather simple variant of the second strategy which authors claim to use just for exposition purposes. Intuitively if you believe in strategy 2 you may believe in strategy 1 (but trust strategy 2 is the criticial piece here). Anyways following are the two strategies (I assume n <= 128)
1. Strategy 1: Fix ell^in to 4. And then randomly select any value in range [2, 4]. Set that value as ell^out. Then run the local mixing procesure for n^3 times.
2. Strategy 2: Divide the entire procedure into 2 stages - (a) infationary stage (b) kneading stage. Run inflationary stage with ell^out 2 and ell^in 4 for n^3 times. Then run the kneading stage with ell^out = ell^in = 4.
For obvious reason by default I prefer strategy 1. And if authors claim that both are equivalent, then, in practice, assuming local mixing is secure, strategy 1 makes sense.
- We sampled a random block cipher reversible circuit with n=64 (ie no. of input bits) and obfuscated the circuit by running local mixing for 4000 times (for testing purposes). We repeated the experiment for 20 times and we observed the following -
- Contrary to what I expected, local mixing step failed only for a few hundred times (roughly 300 on average). And all times it failed because it couldn't find a replacement circuit; It never failed because it couldn't find a convex sub-circuit.
- Based on this observation, I think it makes sense to simply bump up repalcement iterations (probably 1M -> 10M) and add more cores so that runtime remains unchanged.
- I also expect as the graph gets bigger (we woudln't run local mixing step for just 4K times in practice. Instead we will run it for 270K times for n = 64) finding convex subgraph will also become harder. Which is why, for the next experiment, I should also bump up convex iterations (from 1K to 10K).
**Nov 5**
- yesterday I observed that as obfuscation procedure reaches roughly 10K iterations C^out deletion starts to take occasionally take a lot of time. For example, I notice iterations where it took 40s. So I figured I should (for obvious reasons; Remember we need to run 250K iterations) fix it. I've created an issue on petgraph just to check what they maintainers of petgraph think.
- My first attempt is to never delete C^out. Instead mantain a map of removed nodes. And truncate any graph related operation (ie DFS etc.) the moment it reaches any node in the set of removed nodes. Intuitivelly this should work. But at the moment this naive approach seems to introduce cycles in the graph. I don't know what's the cause. I suspect it could be the way we 're updating the levels of the graph'
- Let's assume the naive approach of not deleting C^out introduces graph cycles. This means we cannot use any in-built graph algrothms and hence will have to switch to our implementation of topological sort (we only use in-built topological sort in our implementation) which completely ignores removed nodes. We already maintain a data structure `graph_neighbours` that stores incoming and outgoing edges of each node. For removed nodes, graph_neighbours stores empty lists for incoming and outgoing edges. If any remove node is a neighbour of any active node, then graph neighbour does not count the edge in real graph as an edge. Moreover, any algorithms which uses graph_neighbours simply filters out removed nodes. I implemented topological sort by first calculating level of each active node and then sorting the active nodes as per calculated level. Turns out multi-threaded variant of calcualting graph-level is somewhat buggy. So I ended up using single threaded version. After which obfuscation seems to run, but tests are still under-way.
- We had one obfuscation running with "direct skeleton graph node-deletion" codebase (the slow one) for sometime. I was suppose to run for 50K iterations but I had to kill it because node deletion in skeleton starts to tak >30s in some iterations. I just checked its trace. Following are a few observations:
- It ran for total 15725 iterations.
- Local mixing step returned false 412 times. Everytime because find repalcement circuit returned false. We had bumped up replacment iterations from 1M to 10M for this run. But seems like some circuits are really hard to find replacement for. At all times ell^out was either 3 or 4 (I suspect majority of the time it was 4), which intuitivelly makes sense.
- Local mixing step 14K onwards. Following things (other than C^out deletion) take majority of the time (Most to least):
- Remove edges AND Process predecessors
- Find all predecessors
"Find all predecessors" can be sped up with more cores. "Process predecessors" is already parallelised across C^in gates (which are set to 4). We can further parallelise it but doing so will significantly increase complexity and I'm not sure how worthy would it be.
"Remove edges" comprises of two parts. The first is finding what edges to remove (which is parallelised). The second is removing the edges. I figured "finding" which edges to remove takes roughly 97% of the time.
**Nov 7**
- After reducing runtime for "Process predecessors" with parallelism and "Remove edges" the new bottleneck is "adding new edges". This is due to some peculiarity of the library. We use `update_edge` because it only adds an edge if none exists between the source and the target (having duplicated edges will increase runtime of graph traversal algorithms). This takes O(k) time where k is no. of edges connected with the source. `add_edge` adds an edge irrespective of whether one exists and takes O(1) time. I suspect as the gaph grows in size, no. of edges connected per node grows as well. Hence, bigger graphs, let's say the one after 50K iteations, seems to take quite a while to add an edge (in some cases more than a few seconds!). I think getting around this problem should be easy: Maintain a cache of edges that exist in the graph. Keep the cache in sync with the graph. And only add an edge (with `add_edge`) if it does not exists in the graph. This should roughly take O(l) time where l is no. of new edges to add. Whereas using update_edge takes O(kl) time to add l edges.
- I've been wondering why do we need a graph after all. Can't we maintain maps of outgoing and incoming "edges" per node and just work with it.
because we use `update_edge` function that avoids multiple edges between two same nodes. If we were to use `add_edge`
----
<!--
### Roughy below
It is not necessary for the graph after replacement operation to be a skeleton graph. However, if
A simple sketch of the algorithm can described in terms introducing redundancy to
To build obfuscators O_1 satisfies property 1 we need to introduce sufficient redundancy into the input circuit with m^* gates such that the output circuit has m^{\#} gates where m^{\#} > m^* (authors set $m^{\#}=n^{3}$ where n = n^*) while leaving no open threads for the adversary to start unwinding from. Hence, we cannot naively introduce redundancy to the circuit because an attacker can with high probability cut down the redundancy and find unique representation of the original circuit. For example, if we add redundancy or change the structure of a local sub-section of the circuit randomly, we may with high probability end up swapping gates that commute. Hence, the adeversary can convert the output circuit to a unique representation and find the orignal circuit.
#### Skeleton graph
Which is why, we start with a unique representation of the reversible circuit. We call it "skeleton graph".
The gates in the graphs are the vertices and the edges represent collisions. We say tha two gates collide if they don't commute. Another special condition we set, and, as far I understand, is required to make skeleton graph no too dense, is any gate i can have an edge to colliding gate j iff there exist no such gate k in between that collides with both i and j.
#### Local mixing
With skeleton graph it becomes easy to not make "easy to make" mistakes. That is, not introduce redundancy or chnge the strcuture by swapping commuting gates. The next step of introduce redundancy. We refer to this as the inflationary stage. And the idea remains simeple: Find a random small locally connected, and yet replaceable, sub-section with ell^out gates in the graph and replace it with a functionally equivalent circuit with ell^in gates, where ell^in > ell^out.
However, we're not done yet. If we can intorduce redundancy to graph then why can't the attacker cut it down? Indeed they can. Define C^out to be the ell^out gate circuit and C^in to be ell^in gate repacement circuit. It is clear that CC(C^out) <= ell^out whereas CC(C) <= ell^in where C is a random circuit with ell^in gates. Which means CC(C^in) can be easily spotted by looking for circuits with ell^in gates that have CC <= ell^out. Hence, the attacker can iteratively find C^in s, brute force to find functionally equivalent C^out s and reover the original circuit.
We get rid of redundancy added in the inflationary stage in the kneading stage. At the kneading stage, instage of setting ell^out < ell^in we set ell^out = ell^in = ell^knd where ell^knd > 1 ell^in of inflationary stage. Intention with kneading stage is spread the redundacy added in the inflationary stage over larger and larger areas of the graph such that after adequate number of iterations it becomes hard to find a subgraph with ell^in (of inflationary stage) gates that has CC < ell^out (of inflationary stage).
One obvious question is how does kneading stage guarantees diffusion of complexoty over larger parts of the graph. The arugment that authors make goes as the following: **TODO**
#### Implementation
There are 3 key parts of the implementation. The first is finding a replceable sub-circuit C^out with ell^in gates. The second is finding a functionally equivalent sub-circuit to C^out, C^in with ell^in gates. The third, replacing C^out with C^in while making sure that the resulting graph is still a skeleton graph.
**Finding C^out**
Any subgraph with k vertices is called convex if any vertex on any path between any two vertices part of the subgraph is also part of the subgraph. Observe that any convex subgraph in the skeleton graph can be directly replaced with another subgraph that is functionally equivalent to the convex subgraph.
In our implementation we've taken a random brute force approach to find the candidate convex subgraph per iteration. There may be better ways but we for small values of ell^out (which should usually be the case in practice) the random approach works well enough to not be the bottleneck.
**Finding C^in**
Given C^out with ell^out gates we again take a brute force approach to find funtionally equivalent C^in with ell^in gates. Let omega^out be no. of active wires in C^out and omega^in be the no. of active wires in C^in. At the moment, we assume that there're no attacks if omega^out is equivalent to omega^in always. But we've no way of discerning whether that's true.
**Replacing C^out with C^in**
Once C^in is found we need to replace C^out with C^in. We found it hard to replace C^out with C^in efficiently without voilating rules of skeleton graph while just using the skeleton graph. Hence, we store two additional auxilary information for each gate in C: direct outgoing connections (DOCs) and direct incoming connections (DICs).
There exist a direct outgoing connection from gate i to j, where i < j, iff i collides with j. And there exist a direct incoming connection from to j from i, iff there's a direct outgoing connection from i to j. We store DOCs and DICs of gate i at DOC[i]/DIC[i]. We define skeleton graph of circtui C as SG(C).
To replace C^in wtih C^out, we first update DOC and DICs. To update DOC and DICs observe that since C^out is convex, set of all predecessors and all successors of C^out are disjoint. Hence, it is easy to write the vertices of SG(C) such that vertices of C^out are consecutive. After which we insert C^in in the place of C^out and update DOCs and DICs accordingly.
After updating DOCs and DICs, we update the skeleton graph with new edges induced by inserting C^in and deleting C^out. Doing so is simple. Add an edge from C^in gate i to successor gate j if DOC[i] and DIC[j] are disjoint. And add an edge to C^in gate i from predecessor gate j in DOC[j] to DIC[i] are disjoint
To not violate the rules of skeleton graph, we also delete edges rendered useless by C^in: edge that exist between predecessor gate i and C^in gate's direct outgoing collision gate j. Because, due to C^in, there now exist a gate between i and j that collides with both.
The trickier case is adding an edge from colliding pair of predecessor gate i to successor gate j that do not have an edge because there existed gate k in C^out that collided with both. However, aided with DOCs and DICs this case is simple to handle: for predecessor i that has direct incoming connection with C^out check whether following is true against each successor j that has direct outgoing connection from C^out: i and j collides AND DOC[i], DIC[j] are disjoint. If the condition is true, then add an edge from i to j.
The edges are added following the simple rule of skeleton graph rephrased in terms of DOC and DIC: SG(C) will have an edge between i and j iff DOC[i]
Let C be the reversible circuit and let SG(C) be the skeleton graph of C. Define DOC as the map that stores direct outgoing collisions of any gate i and define DIC as the map that stores direct incoming collisions of any gate j. Gate i will have a direct outgoing collisions with gate j, for i < j, if gate i collides with gate j. Gate j will have a direct incmoing collision with gate i, for i < j, if gate j collides with gate i. Observer that SG(C) will have en edge from i to j iff the sets DOC[i] and DIC[j] are disjoint. That is, there exist no gate between i and j that collides with both.
We use DOC and DIC as auxilary information to replace C^out with C^in while making sure that the resulting graph is skeleton graph. To make the replacement, we first delete C^out from SG(C), DOC, and DIC. We then update SG(C), DOC, and DIC with C^in. The deletion of C^out and insetion of C^in is simple given since C^out is convex. We topologically sort the skeleton graph and write it such that all gates in C^out are consecutive. We then take out C^out and insert C^in in-place. Then we (1) update DOC by making direct connections (2)
---
**Replacing circuit B with ell_out gates with functionally equivalent circuit A with ell_in gates**
I have been wondering for quite a while about how to do this efficiently (remember the circuit is quite big and we've to do this operation many many times).
Let A be the circuit that replaces circuit B. Before I outline the strategy I want to point out a few things:
1. A is functionally equivalent to B. Node with outgoing edge to any node in subgraph B must also have an outgoing edge to some node in A. This is because if gate \gamma_i outside B collides and with \gamma_j inside B and if it does not collide with any gate in A then A fails to be functionally equivalent to B.
2. Any node outside B with incoming edge from any node inside B must also have incoming edge from node inside A. This also follows from F(B) = F(A)
3. If A has more active wires than B, then replacement becomes much harder because there are new collisions. Let's say A uses wires k as borrow wire which B did not touch. Any gate that mutates k and is not predecessor nor succesor of A is not swappable with any other gate within A because it will damange the temporary storage. This means there should be new edge added either incoming to A or outgoing from A.
Now the question is how to do relace B with A without much cost? Ideally we would want it to take O(n) time.
For case \omega_in == \omega_out
1. Find all incoming predecessor of subgraph B. Remove the edges that lead them to subgraph B. Topolgically sort them and call the array Incoming.
2. Find all outgoing sucessors of subgraph B. Remove all edges from subgraph B to successors. Topologically sort them and call it the array Outgoing.
3. Call gates of the new circiut A Gates_A.
4. Construct the skelton graph of reversible circuit [Incoming | Gates_A | Outgoing]
5. Add the necessary edges to the main graph.
For case \omega_in > \omega_out
1. Run for the case \oomega_in == \omega_out
2. Find the set of gates that are neither predecessors nor successors of subgraph A. Call this set of gates O.
3. Remove all gates in O that don't touch the new wire k at all.
4. Check dependencies in O and eliminate uncessary gates. For example if \gamma_i has an edge to \gamma_j, both part of O, then remove \gamma_i because \gamma_j is sufficient to express dependency.
5. Toss a coin:
- If heads: For each gate \gamma_i in O add an outgoing edge from \gamma_i to first gate it collides in A (which will be a gate that touches wire k)
- If tails: For each gate \gamma_i in O add an incoming edge from last gate it collides with in A.
-->