# Notes on *"Drill & Join: A Method for Exact Inductive Program Synthesis"* Some notes on the ***[Drill & Join: A Method for Exact Inductive Program Synthesis](https://link.springer.com/chapter/10.1007/978-3-319-17822-6_13)*** paper by *Remis Balaniuk*. ## <span style="color:#a61e37">1. Algebra definitions</span> - **Vector space**: an algebraic structure defined by: 1. a scalar field ***K*** 1. a set of vectors ***S*** 1. two binary operations that must oblige a set of properties (**vector sum** and **multiplication of a scalar for a vector**) - **Linear combination**: it's an expression containing *sums of vectors* and *multiplications of vectors by scalars*. Given a vector space ***V***, defined on a scalar field ***K*** and a set of ***N*** vectors ***S* = {*v<sub>0</sub>*, *v<sub>1</sub>*, ..., *v<sub>N</sub>*}**, a linear combination can be represented as **a<sub>0</sub>\*v<sub>0</sub> + a<sub>1</sub>\*v<sub>1</sub> + ... + a<sub>N</sub>\*v<sub>N</sub>**, where **{*a<sub>0</sub>*, *a<sub>1</sub>*, ..., *a<sub>N</sub>*}** are scalars of the field ***K***. - **Basis of a vector space**: a set of vectors with which it's possible to generate the entire vector space using linear combinations of the vectors. - **Span of a vector space**: the set of all the vectors that can be generated using linear combinations of the vectors part of the vector space. - **Dimension of a vector space**: it's the cardinality of one of the bases that can be used to generate the entire vector space. - **Algebraic ring**: it's a fundamental algebraic structure which consists of a set equipped with two binary operations that generalize the arithmetic operations of addition and multiplication. Through this generalization, theorems from arithmetic are extended to non-numerical objects such as polynomials, series, matrices and functions. - **Boolean algebra**: it's a set with binary operations *AND* and *OR* and the binary operator *NOT*, hence satisfying the Boolean laws. - **Boolean ring**: it's a ring for which **x<sup>2</sup> = x** for all the elements **x** of the ring. It's a Boolean algebra, with ring multiplication corresponding to conjuction (*AND*) and ring addition to exclusive disjunction (*XOR*). In Logic the combination of the operators *XOR* and *AND* over the elements true/false, produce the Galois Field F<sub>2</sub>. ## <span style="color:#a61e37">2. Galois Field 2 - GF(2)</span> * Shortened as **GF(2)**, **F<sub>2</sub>** or **Z/2Z** * It's the smallest field made of two elements * The elements are called **true** (1) and **false** (0) * As every algebraic field, two operations are defined: addition and multiplication * The **addition** is carried on with the logical **XOR** operation * The **multiplication** is carried on with the logical **AND** operation * Addition has an identity element (**false**) and an inverse for each element * Multiplication has an identity element (**true**) and an inverse for every element (but **false**) ### Boolean polynomials A Boolean polynomial in ***B*** (Boolean algebra) is a string that results from a finite number of Boolean operations on a finite number of elements in ***B***. A multivariate polynomial over a ring has a unique representation as a xor-sum of monomials. This gives a normal form for Boolean polynomials: **XorSum(*J* &sub; {1, 2,.., n}, *a<sub>J</sub>* &times; MulSum(*j* &isin; *J*, *x<sub>j</sub>*))** where the *a<sub>J</sub>* &isin; *B* coefficients are uniquely determined. This representation it's called ***[Algebraic Normal Form](#3.-Algebraic-Normal-Form-(ANF))***. A Boolean function of *n* variables **&fnof;: *Z<sub>2</sub>*<sup>n</sup> &rArr; *N<sub>2</sub>*** can be associated with a Boolean polynomial by deriving an algebraic normal form. ### GF(2) operations **Addition** | + | 0 | 1 | |:-----:|:-:|:-:| | **0** | 0 | 1 | | **1** | 1 | 0 | **Multiplication** | * | 0 | 1 | |:-----:|:-:|:-:| | **0** | 0 | 0 | | **1** | 0 | 1 | ## <span style="color:#a61e37">3. Algebraic Normal Form (ANF)</span> * Called also ***Zhegalkin Normal Form*** or ***Reed-Muller Expansion*** * The entire formula always evaluates to either **true** (1) or **false** (0) * Only **AND** (between single variables) and **XOR** operations are available ### Algebraic Normal Form expansion rules **XOR (logical exclusive disjunction)** > **(1 &oplus; A) &oplus; (1 &oplus; A &oplus; B)** > &rArr; (1 &oplus; A &oplus; 1 &oplus; A &oplus; B) > &rArr; (1 &oplus; 1 &oplus; A &oplus; A &oplus; B) > &rArr; B **NOT (logical negation)** > **&not;(1 &oplus; A &oplus; B)** > &rArr; 1 &oplus; (1 &oplus; A &oplus; B) > &rArr; 1 &oplus; 1 &oplus; A &oplus; B > &rArr; A &oplus; B **AND (logical conjunction)** > **(1 &oplus; A) &and; (1 &oplus; A &oplus; B)** > &rArr; (1 &and; (1 &oplus; A &oplus; B)) &oplus; (A &and; (1 &oplus; A &oplus; B)) > &rArr; (1 &oplus; A &oplus; B) &oplus; (A &oplus; A &oplus; (A &and; B)) > &rArr; 1 &oplus; A &oplus; B &oplus; A &oplus; A &oplus; (A &and; B) > &rArr; 1 &oplus; A &oplus; A &oplus; A &oplus; B &oplus; (A &and; B) > &rArr; 1 &oplus; A &oplus; B &oplus; (A &and; B) **OR (logical disjunction)** Two rules are available: * **A &or; B = 1 &oplus; ((1 &oplus; A) &and; (1 &oplus; B))** * **A &or; B = A &oplus; B &oplus; (A &and; B)** > **(1 &oplus; A) &or; (1 &oplus; A &oplus; B)** > &rArr; (1 &oplus; A) &oplus; (1 &oplus; A &oplus; B) &oplus; ((1 &oplus; A) &and; (1 &oplus; A &oplus; B)) > &rArr; (1 &oplus; A) &oplus; (1 &oplus; A &oplus; B) &oplus; ((1 &and; (1 &oplus; A &oplus; B)) &oplus; (A &and; (1 &oplus; A &oplus; B))) > &rArr; (1 &oplus; A) &oplus; (1 &oplus; A &oplus; B) &oplus; ((1 &oplus; A &oplus; B) &oplus; (A &oplus; A &oplus; (A &and; B))) > &rArr; (1 &oplus; A) &oplus; (1 &oplus; A &oplus; B) &oplus; ((1 &oplus; A &oplus; B &oplus; A &oplus; A &oplus; (A &and; B))) > &rArr; (1 &oplus; A) &oplus; (1 &oplus; A &oplus; B) &oplus; ((1 &oplus; A &oplus; A &oplus; A &oplus; B &oplus; (A &and; B))) > &rArr; (1 &oplus; A) &oplus; (1 &oplus; A &oplus; B) &oplus; (1 &oplus; A &oplus; B &oplus; (A &and; B)) > &rArr; 1 &oplus; A &oplus; 1 &oplus; A &oplus; B &oplus; 1 &oplus; A &oplus; B &oplus; (A &and; B) > &rArr; 1 &oplus; 1 &oplus; 1 &oplus; A &oplus; A &oplus; A &oplus; B &oplus; B &oplus; (A &and; B) > &rArr; 1 &oplus; A &oplus; (A &and; B) ## <span style="color:#a61e37">4. Abstract algebra and higher order functions</span> Let *V<sup>n</sup>* be the set of all binary words of length *n*, |*V<sup>n</sup>*| = 2<sup>n</sup>. The Boolean algebra *B* on *V<sup>n</sup>* is a vector space over *Z<sub>2</sub>*. This correspondence between an algebra and our program space defines some useful properties: * the operations in a family need not be all explicitly stated; * a basis is any set of operators (operations) from which the remaining operations can be obtained by composition. A Boolean algebra may be defined from any of several different bases; * to be a basis is to yield all other operations by composition, whence any two bases must be intertranslatable; * a basis is a linearly independent spanning set; * let *v<sub>1</sub>*, .., *v<sub>m</sub>* &isin; *B* be a basis of *B*. *Span*(*v<sub>1</sub>*, .., *v<sub>m</sub>*) = &#123; *&lambda;<sub>1</sub>* &and; *v<sub>1</sub>* &oplus; .. &oplus; *&lambda;<sub>m</sub>* &and; *v<sub>m</sub>* &#124; *&lambda;<sub>1</sub>*, .., *&lambda;<sub>m</sub>* &isin; *Z<sub>2</sub>* &#125;; * the dimension *dim*(*B*) of the Boolean algebra is the minimum *m* such that *B* = *span*(*v<sub>1</sub>*, .., *v<sub>m</sub>*). ## <span style="color:#a61e37">5. The *Drill* function</span> As first thing we define a **XorSum** function in pseudocode as: ``` def XorSum(from, to, elements) { result = 0 for (i = from; i < to; i++) { result ^= elements[i] } return result } ``` We define the set ***F<sub>m</sub>*** of functions of the form **&fnof;: Z<sub>2</sub><sup>p</sup> &times; Z<sub>2</sub><sup>q</sup> &rarr; Z<sub>2</sub>** and contains Boolean functions belonging to a Boolean algebra of dimension **m** and described in polynomial form as: **&fnof;(X, Y) = *XorSum*(1, m, g<sub>i</sub>(X) &and; h<sub>i</sub>(Y))** where **g<sub>i</sub>: Z<sub>2</sub><sup>p</sup> &rArr; Z<sub>2</sub>** and **h<sub>i</sub>: Z<sub>2</sub><sup>q</sup> &rArr; Z<sub>2</sub>** are also boolean functions. The polynomial of order **n** (with **n** variables) has been rewritten by two polynomials of order **p** and **q**, where **p + q = n**. Considering a function **&fnof; &isin; *F<sub>m</sub>***, a chosen **X<sub>0</sub> &isin; Z<sub>2</sub><sup>p</sup>** and a chosen **Y<sub>0</sub> &isin; Z<sub>2</sub><sup>q</sup>** such that **&fnof;(X<sub>0</sub>, Y<sub>0</sub>) &ne; 0**, we define the ***Drill*** high-order function as: <span style="color:#1455cc">**F<sub>X<sub>0</sub>,Y<sub>0</sub></sub> = F(&fnof;(X, Y), X<sub>0</sub>, Y<sub>0</sub>) = &fnof;(X, Y) &oplus; (&fnof;(X<sub>0</sub>, Y) &and; &fnof;(X, Y<sub>0</sub>))**</span> ### Theorem If &fnof; &isin; *F<sub>m</sub>* and &fnof;(X<sub>0</sub>, Y<sub>0</sub>) &ne; 0, then *&fnof;<sup>1</sup>* = **F**(X<sub>0</sub>, Y<sub>0</sub>) is included in a space *F<sub>r</sub>* with *r* &leq; m-1. We therefore generated a new function of reduced vector space dimension. ### Proof 1. Consider *W* = *span*(*h<sub>1</sub>*, .., *h<sub>m</sub>*), which means a basis has *m* generator vectors. 2. Consequently *dim*(*W*) &leq; *m*. 3. The linear operator *h* &isin; *W* &rarr; *h*(*Y<sub>0</sub>*) is not the zero map because the hypothesis (*&fnof;*(X<sub>0</sub>, Y<sub>0</sub>) &ne; 0) forbids *h<sub>i</sub>*(*Y<sub>0</sub>*) = 0 for all *i* = 1, .., *m*.<br><br> > ***&fnof;*(X<sub>0</sub>, Y<sub>0</sub>) &ne; 0** > &hArr; *XorSum*(1, *m*, *g<sub>i</sub>*(X<sub>0</sub>) &and; *h<sub>i</sub>*(Y<sub>0</sub>)) &ne; 0 > &hArr; &forall;*i*: *g<sub>i</sub>*(X<sub>0</sub>) &and; *h<sub>i</sub>*(Y<sub>0</sub>) &ne; 0 > &hArr; &forall;*i*: *h<sub>i</sub>*(Y<sub>0</sub>) &ne; 0 4. Consequently, the vector subspace *W<sup>1</sup>* = &#123; *h* &isin; *W* &#124; *h*(*Y<sub>0</sub>*) = 0 &#125; has *dim*(*W<sup>1</sup>*) &leq; (*m* - 1). Because we know for sure that less than *m* functions *h<sub>i</sub>*(*Y<sub>0</sub>*) can lead to 0 from the step 3. 5. Notice that for all *X* &isin; *Z<sub>2</sub><sup>p</sup>* we have the *Drill* function *f<sup>1</sup>*(*X*, &middot;) &isin; *W<sup>1</sup>*.<br><br> > ***f<sup>1</sup>*(X, Y<sub>0</sub>)** > &rArr; *f*(X, Y<sub>0</sub>) &oplus; (*f*(X<sub>0</sub>, Y<sub>0</sub>) &and; *f*(X, Y<sub>0</sub>)), as per *Drill* expansion > &rArr; *f*(X, Y<sub>0</sub>) &oplus; (1 &and; *f*(X, Y<sub>0</sub>)), as *f*(X<sub>0</sub>, Y<sub>0</sub>) &ne; 0 (hence 1 in F<sub>2</sub>) by hypothesis > &rArr; *f*(X, Y<sub>0</sub>) &oplus; *f*(X, Y<sub>0</sub>) > &rArr; 0 6. Let *r* = *dim*(*W<sup>1</sup>*) and *h<sup>1</sup><sub>i</sub>*, with *i* = 1, .., *r* be a spanning set such that *W<sup>1</sup>* = *span*(*h<sub>1</sub>*, .., *h<sub>r</sub>*). For all *X* &isin; *Z<sub>2</sub><sup>p</sup>*, *f<sup>1</sup>*(X, &middot;) can be represented as a linear combination of the *h<sup>1</sup><sub>i</sub>*, the coefficients depending on *X*. In other words, there exist coefficients *g<sup>1</sup><sub>i</sub>*(*X*) such that ***f<sup>1</sup><sub>i</sub>*(*X*, &middot;) = *XorSum*(1, *r*, *g<sup>1</sup><sub>i</sub>*(*X*) &and; *h<sup>1</sup><sub>i</sub>*)**, or written differently: ***f<sup>1</sup>*(*X*, *Y*) = *XorSum*(1, *r*, *g<sup>1</sup><sub>i</sub>*(*X*) &and; *h<sup>1</sup><sub>i</sub>*(*Y*))**. ### *Drill* application example As an illustration of the application of the *Drill* function, consider the Boolean function *&fnof;*(*x*, *y*): *Z<sub>2</sub>* &times; *Z<sub>2</sub>* &rarr; *Z<sub>2</sub>* whose behaviour can be represented by the following truth table. | *x* | *y* | *&fnof;*(*x*, *y*) = *y* &or; (&not;*x* &and; &not;*y*) | |:-:|:-:|:-:| | F | F | T | | F | T | T | | T | F | F | | T | T | T | **Note**: the method does not require high-level definitions, only I/O pairs. So the illustration with the known function expression is just to show how the method works. #### Rewriting the function in ANF To ease the reasoning we can rewrite the *&fnof;*(*x*, *y*) function in ANF as: > ***&fnof;*(*x*, *y*)** > &rArr; *y* &or; (&not;*x* &and; &not;*y*) > &rArr; *y* &or; ((T &oplus; *x*) &and; (T &oplus; *y*)) > &rArr; *y* &or; ((T &and; (T &oplus; *y*)) &oplus; (*x* &and; (T &oplus; *y*))) > &rArr; *y* &or; ((T &oplus; *y*) &oplus; (*x* &oplus; (*x* &and; *y*))) > &rArr; *y* &or; (T &oplus; *y* &oplus; *x* &oplus; (*x* &and; *y*)) > &rArr; *y* &oplus; (T &oplus; *y* &oplus; *x* &oplus; (*x* &and; *y*)) &oplus; (*y* &and; (T &oplus; *y* &oplus; *x* &oplus; (*x* &and; *y*))) > &rArr; *y* &oplus; (T &oplus; *y* &oplus; *x* &oplus; (*x* &and; *y*)) &oplus; (*y* &oplus; *y* &oplus; (*x* &and; *y*) &oplus; (*x* &and; *y*)) > &rArr; *y* &oplus; T &oplus; *y* &oplus; *x* &oplus; (*x* &and; *y*) &oplus; *y* &oplus; *y* &oplus; (*x* &and; *y*) &oplus; (*x* &and; *y*) > &rArr; T &oplus; *x* &oplus; (*x* &and; *y*) The rewritten *&fnof;*(*x*, *y*) function has dimension 2 (as can be seen by the two variable components ***x*** and **(*x* &and; *y*)**) and <u>no shorter representation is possible</u>. Respecting the theorem hypothesis *&fnof;*(*x<sub>0</sub>*, *y<sub>0</sub>*) &ne; 0, we can query the oracle (the truth table in this case) and obtain: ***x<sub>0</sub>* = F** and ***y<sub>0</sub>* = F**. Consequently we can obtain the partial functions (we'll be able to obtain their representation only with new iterations of the *Drill* function or with the support of the *Join* function, but for illustration purposes we use the known Boolean expression): > ***&fnof;*(*x<sub>0</sub>*, *y*)** > &rArr; *&fnof;*(F, *y*) > &rArr; T &oplus; F &oplus; (F &and; *y*) > &rArr; T > ***&fnof;*(*x*, *y<sub>0</sub>*)** > &rArr; *&fnof;*(*x*, F) > &rArr; T &oplus; *x* &oplus; (*x* &and; F) > &rArr; T &oplus; *x* That can be used to build the full *Drill* function: > ***&fnof;<sup>1</sup>*(*x*, *y*)** > &rArr; F(*&fnof;*(*x*, *y*), *x<sub>0</sub>*, *y<sub>0</sub>*) > &rArr; *&fnof;*(*x*, *y*) &oplus; (*&fnof;*(*x<sub>0</sub>*, *y*) &and; *&fnof;*(*x*, *y<sub>0</sub>*)) > &rArr; (T &oplus; *x* &oplus; (*x* &and; *y*)) &oplus; (T &and; (T &oplus; *x*)) > &rArr; (T &oplus; *x* &oplus; (*x* &and; *y*)) &oplus; (T &oplus; *x*) > &rArr; T &oplus; *x* &oplus; (*x* &and; *y*) &oplus; T &oplus; *x* > &rArr; (*x* &and; *y*) We can see that *&fnof;<sup>1</sup>*(*x*, *y*) has dimension 1 (a single variable component **(*x* &and; *y*)**), confirming the *Drill* theorem. ## <span style="color:#a61e37">6. The *Join* function</span> Consider the set ***F<sub>m</sub>*** of Boolean functions of the form **&fnof;: Z<sub>2</sub><sup>n</sup> &rarr; Z<sub>2</sub>** and ***v<sub>1</sub>*, .., *v<sub>m</sub>* &isin; *F<sub>m</sub>*** a basis. The functions in this set can be described in polynomial form as: ***&fnof;*(*X*) = *XorSum*(1, *m*, *&lambda;<sub>i</sub>* &and; *v<sub>i</sub>*(*X*))**, where *&lambda;<sub>i</sub>* &isin; *Z<sub>2</sub>* are the coefficients. That's basically a linear combination of the generator vectors with *Z<sub>2</sub>* coefficients. Considering a function *&fnof;* &isin; *F<sub>m</sub>*, a chosen *X<sub>j</sub>* &isin; *Z<sub>2</sub><sup>n</sup>* (which is the input to a partial function) such that *&fnof;*(*X<sub>j</sub>*) &ne; 0 (because we are interested in rebuilding the partial functions obtained in a previous step, that have non-zero result) and a chosen function *v<sub>j</sub>*(*X<sub>j</sub>*) &ne; 0 (otherwise we wouldn't be able to apply the linear combination for the selected *X<sub>j</sub>*), we define the *Join* higher-order function as: <span style="color:#1455cc">**H<sub>*X<sub>j</sub>v<sub>j</sub></sub>* = H(*&fnof;*(*X*), *X<sub>j</sub>*, *v<sub>j</sub>*) = *&fnof;*(*X*) &oplus; *v<sub>j</sub>*(*X*)**</span> ### Theorem If *&fnof;* &isin; *F<sub>m</sub>*, *&fnof;*(*X<sub>j</sub>*) &ne; 0 and *v<sub>j</sub>*(*X*<sub>j</sub>) &ne; 0, then *&fnof;<sup>2</sup>* = H<sub>*X<sub>j</sub>v<sub>j</sub></sub>* &isin; *F<sub>r</sub>* and *r* &leq; *m* - 1. We therefore generated a new function of reduced vector space dimension (as in the *Drill* theorem). ### Proof 1. Consider *W* = *span*(*v<sub>1</sub>*, .., *v<sub>m</sub>*), which means a basis has *m* generator vectors. 2. Consequently *dim*(*W*) &leq; *m*. 3. The linear operator *v* &isin; *W* &rarr; *v*(*X<sub>j</sub>*) is not the zero map, otherwise *v<sub>j</sub>*(*X<sub>j</sub>*) = 0 would be against one the hyphotesis (*v<sub>j</sub>*(*X*<sub>j</sub>) &ne; 0). 4. Consequently, the vector subspace *W<sup>2</sup>* = &#123; *&fnof;* &isin; *W* &#124; *&fnof;*(*X<sub>j</sub>*) = 0 &#125; has *dim*(*W<sup>2</sup>*) &leq; *m* - 1. Because we know for sure that less than *m* functions *&fnof;*(*X<sub>j</sub>*) can lead to 0, knowing that some *&fnof;*(*X<sub>j</sub>*) &ne; 0 per hypothesis. 5. *&fnof;<sup>2</sup>* &isin; *W<sup>2</sup>*, in fact *&fnof;<sup>2</sup>* = *&fnof;*(*X<sub>j</sub>*) &oplus; *v<sub>j</sub>*(*X<sub>j</sub>*) = 0 and per hypotheses we have *&fnof;*(*X<sub>j</sub>*) &ne; 0 and *v*(*X<sub>j</sub>*) &ne; 0, which in F<sub>2</sub> means that *&fnof;*(*X<sub>j</sub>*) = 1 and *v*(*X<sub>j</sub>*) = 1. 6. Let *r* = *dim*(*W<sup>2</sup>*) and *v<sup>2</sup><sub>i</sub>*, i = 1, .., *r* be a spanning set such that *W<sup>2</sup>* = *span*(*v<sup>2</sup><sub>1</sub>*, .., *v<sup>2</sup><sub>r</sub>*). The function *&fnof;<sup>2</sup>* can be represented as a linear combination of the *v<sup>2</sup><sub>i</sub>* generator vectors. In other words, there exist coefficients *&lambda;<sup>2</sup><sub>i</sub>* such that: ***&fnof;<sup>2</sup>*(*X*) = *XorSum*(1, *r*, *&lambda;<sup>2</sup><sub>i</sub>*(*X*) &and; *v<sup>2</sup><sub>i</sub>*)**. ### *Join* application example As an illustration of the application of the *Join* function, we can consider the same Boolean function represented in the [*Drill* application example](#Drill-application-example) section. *&fnof;* belongs to a Boolean algebra of dimension 2<sup>2</sup> (as seen in the [Rewriting the function in ANF](#Rewriting-the-function-in-ANF) section) which can be defined, for instance, by the following spanning set: * *v<sub>1</sub>*(*x*, *y*) = *T* * *v<sub>2</sub>*(*x*, *y*) = *x* * *v<sub>3</sub>*(*x*, *y*) = *y* * *v<sub>4</sub>*(*x*, *y*) = *x* &and; *y* Respecting the stated hypotheses we can can pick *X<sub>j</sub>* = (T, T) and *v<sub>j</sub>* = *v<sub>2</sub>*(*x,* *y*) = *x*, such that we have: *&fnof;*(T, T) &ne; F and *v<sub>2*(T, T) &ne; F. Applying the *Join* function we obtain: > ***&fnof;<sup>2</sup>*(*x*, *y*)** > &rArr; **H**(*&fnof;*(*X*), *X<sub>j</sub>*, *v<sub>j</sub>*) > &rArr; *&fnof;*(*X*) &oplus; *v<sub>j</sub>*(*X*) > &rArr; (*y* &or; (&not;*x* &and; &not;*y*)) &oplus; *x* > &rArr; T &oplus; *x* &oplus; (*x* &and; *y*) &oplus; *x* > &rArr; T &oplus; (*x* &and; *y*) We can see that *&fnof;<sup>2</sup>*(*x*, *y*) has dimension 1 (a single variable component **(*x* &and; *y*)**), confirming the *Join* theorem. <span style="color:#fc0303">**@Rolf: The ANF expression obtained in the original paper is (*x* &and; *y*) which seem to be wrong (missing the "T &oplus;" term).**</span> ## <span style="color:#a61e37">7. The *Drill* & *Join* program synthesis method</span> *Drill* and *Join* are used to define a program synthesis method. Considering an active learning (where the algorithm can interactively query the user or some other information source, usually called oracle) framework, the input function *&fnof;*(*X*, *Y*) on **F** (*Drill* function) and the input function *&fnof;*(*X*) on **H** (*Join* function) represent an external **unknown concept** from which it is possible to obtain data by means of queries (I/O pairs). This **unknown concept** could be, for instance, a computer program that one would like to emulate (e.g. a blackbox function to be inferred) or optimize (e.g. a shellcode to be deobfuscated). ### Properies of the *Drill* and *Join* higher-order functions * **F** and **H** can be applied recursively: if *&fnof;*(*X,* *Y*) &isin; *F<sub>m</sub>* then *&fnof;<sup>1</sup><sub>1</sub>*(*X*, *Y*) = **F**(*&fnof;*(*X*, *Y*), *X<sub>0</sub>*, *Y<sub>0</sub>*) &isin; *F<sub>m-1</sub>* and *&fnof;<sup>1</sup><sub>2</sub>*(*X*, *Y*) = **F**(*&fnof;<sup>1</sup>*(*X*, *Y*), *X<sub>1</sub>*, *Y<sub>1</sub>*) &isin; *F<sub>m-2</sub>*. Similarly, if *&fnof;*(*X*) &isin; *F<sub>m</sub>* then *&fnof;<sup>2</sup><sub>1</sub>*(*X*) = **H**(*&fnof;*(*X*), *X<sub>0</sub>*, *v<sub>0</sub>*) &isin; *F<sub>m-1</sub>* and *&fnof;<sup>2</sup><sub>2</sub>*(*X*) = **H**(*&fnof;*(*X*), *X<sub>1</sub>*, *v<sub>1</sub>*) &isin; *F<sub>m-2</sub>*. Each recursion generates a new function belonging to an algebra of a lower dimension. * The recursion ends when the higher-order functions become the zero map: **F**(*&fnof;*(*X*, *Y*), *X<sub>i</sub>*, *Y<sub>i</sub>*) = 0 &iff; *&fnof;*(*X*, *Y*) = (*&fnof;*(*X<sub>i</sub>*, *Y*) &and; *&fnof;*(*X*, *Y<sub>i</sub>*)) and similarly **H**(*&fnof;*(*X*), *X<sub>i</sub>*, *v<sub>i</sub>*) = 0 &iff; *&fnof;*(*X*) = *v<sub>i</sub>*(*X*). ### Obtaining the output program Two functions derived by the unrolled recursion of the *Drill* and *Join* functions can be used to rebuild the original target function. #### *Drill* rebuilding formula The original target function *&fnof;*(*X*, *Y*) can be recreated using the partial function *&fnof;<sup>1</sup>* obtained using the *Drill* function: > ***&fnof;*(*X*, *Y*) = *XorSum*(1, *m*, *&fnof;<sup>1</sup><sub>i</sub>*(*X<sub>i</sub>*, *Y*) &and; *&fnof;<sup>1</sup><sub>i</sub>*(*X*, *Y<sub>i</sub>*))** #### *Join* rebuilding formula The original target function *&fnof;*(*X*) can be recreated using the partial function *&fnof;<sup>2</sup>* obtained using the *Join* function: > ***&fnof;*(*X*) = *XorSum*(1, *m*, *v<sub>i</sub>*(*X*))** **Note**: - if the *Drill* initial condition cannot be established (no (*X<sub>0</sub>*, *Y<sub>0</sub>*) such that *&fnof;*(*X<sub>0</sub>*, *Y<sub>0</sub>*) &ne; 0 can be found), then the target function is necessarily *&fnof;*(*X*, *Y*) = F; - if the *Join* initial condition cannot be established (no *X<sub>0</sub>* such that *&fnof;*(*X<sub>0</sub>*) &ne; 0 can be found), then the target function is the zero map. Although if given a valid *X<sub>j</sub>* no *v<sub>j</sub>* function such that *v<sub>j</sub>*(*X<sub>j</sub>*) &ne; 0 can be found, then the basis has been improperly selected; - the application of the **F** higher-order function alone requires the solution of an exponential number of subspace synthesis problems. And the **H** higher-order function may require a high cardinality basis. The two functions must be applied together to lead to proper results. ## <span style="color:#a61e37">8. Complete example of application of the *Drill* & *Join* functions</span> * Input function to be synthesized: **&fnof;(x, y) = (y &or; (&not;x &and; &not;y))** * The oracle (truth table) for the input function is available in the [*Drill* application example](#Drill-application-example) chapter. * As previously mentioned, the function is already in the minimal form, but it'll end up as an ANF expression. * The steps will involve a mix of *Drill* and *Join* phases, depending on the knowledge of the vector space bases (necessary for the *Join* phases). * **Note**: the example in the original paper is wrong because it violates the hypotheses of the *Join* theorem, selecting a basis of *v<sub>j</sub>* functions and *Z<sub>j</sub>* values that lead to *v<sub>j</sub>*(*Z<sub>j</sub>*) = 0. The same example can be found in its corrected form in the ***[Effectiveness of Synthesis in Concolic Deobfuscation](https://www.sciencedirect.com/science/article/pii/S0167404817301475)*** paper. <span style="color:#fc0303">**@Rolf: Just in case it may be worth to verify if the example in the original paper it's wrong, to me it looks like it is.**</span> ### Step by step example 1. We start applying the *Drill* function *&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y*) = **F**(*&fnof;*(*x*, *y*), *x<sub>0</sub>*, *y<sub>0</sub>*) such that *&fnof;*(*x<sub>0</sub>*, *y<sub>0</sub>*) &ne; F. We can query the oracle (using random sampling or relying on the knowledge of the truth table) and choose ***x<sub>0</sub> = F*** and ***y<sub>0</sub> = F***. Note that the pairs (*x<sub>0</sub>* = F, *y<sub>0</sub>* = T) and (*x<sub>0</sub>* = T, *y<sub>0</sub>* = T) would have been valid choices too, because they would have validated the hypothesis *&fnof;*(*x<sub>0</sub>*, *y<sub>0</sub>*) &ne; F. Now that valid inputs are known we can apply the *Drill* function: > ***&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y*)** > &rArr; **F**(*&fnof;*(*x*, *y*), x<sub>0</sub>, y<sub>0</sub>) > &rArr; *&fnof;*(*x*, *y*) &oplus; (*&fnof;*(*x<sub>0</sub>*, *y*) &and; *&fnof;*(*x*, *y<sub>0</sub>*)) Resulting in two partial functions ***&fnof;*(*x<sub>0</sub>*, *y*)** and ***&fnof;*(*x*, *y<sub>0</sub>*)** with truth table: | *x* | *y* | *&fnof;*(*x<sub>0</sub>* = F, *y*) | *&fnof;*(*x*, *y<sub>0</sub>* = F)| |:-:|:-:|:-:|:-:| | F | F | T | T | | F | T | T | T | | T | F | T | F | | T | T | T | F | We should then check if ***&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y*) = F** for each value of *x* and *y*, to determine if *&fnof;<sup>1</sup><sub>1</sub>* is the zero map and if we should therefore stop the algorithm. | *x* | *y* | *&fnof;*(*x<sub>0</sub>* = F, *y*) &and; *&fnof;*(*x*, *y<sub>0</sub>* = F) | *&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y*) | |:-:|:-:|:-:|:-:| | F | F | T | F | | F | T | T | F | | T | F | F | F | | T | T | F | T | We can see that *&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y*) &ne; F for (*x* = T, *y* = T), so we didn't find the zero map. 2) We can now apply the *Join* function *&fnof;<sup>2</sup><sub>1</sub>*(*Z*) = **H**(*&fnof;*(*Z*), *Z<sub>j</sub>*, *v<sub>j</sub>*) such that *v<sub>j</sub>*(*Z<sub>j</sub>*) &ne; F and *&fnof;*(*Z<sub>j</sub>*) &ne; F. We need to select a basis for the vector space containing the ***&fnof;*(*x<sub>0</sub>*, *y*)** and ***&fnof;*(*x*, *y<sub>0</sub>*)** partial functions, for example selecting the two generator vectors ***v<sub>0</sub>*(*z*) = *T*** and ***v<sub>1</sub>*(*z*) = z** (where *z* can be either *x* or *y*). The truth table for the *v<sub>j</sub>* functions follows: | *x* | *v<sub>0</sub>*(*x*) | *v<sub>1</sub>*(*x*) | |:-:|:-:|:-:| | F | T | F | | T | T | T | Now that a basis is known we can apply the *Join* function: > ***&fnof;<sup>2</sup><sub>1</sub>*(*X*)** > &rArr; **H**(*&fnof;*(*Z*), *Z<sub>j</sub>*, *v<sub>j</sub>*) > &rArr; *&fnof;*(*Z*) &oplus; *v<sub>j</sub>*(*Z<sub>j</sub>*) Applying a first iteration of the *Join* function to the ***&fnof;*(*x<sub>0</sub>*, *y*)** function, selecting ***y<sub>0</sub>* = T** and ***v<sub>0</sub>*(*y*) = *T*** to oblige to the hypotheses (note that selecting the pair (*y<sub>0</sub>* = F, *v<sub>0</sub>*(*y*) = T) would have been a valid choice too), we obtain: | *y* | *&fnof;<sup>2</sup><sub>1,1</sub>* = **H**(*&fnof;*(*x<sub>0</sub>*, *y*), *y<sub>0</sub>*, *v<sub>0</sub>*) | |:-:|:-:| | F | F | | T | F | We can see that *&fnof;<sup>2</sup><sub>1,1</sub>* = F for all the values of *x* and *y*, so the *&fnof;<sup>2</sup><sub>1,1</sub>*(*Z*) function is the zero map and we can therefore stop the recursion and obtain the full rebuilt function (using the [*Join* rebuilding formula](#Join-rebuilding-formula)): <span style="color:#de8912">***&fnof;*(*x<sub>0</sub>*, *y*) = *v<sub>0</sub>*(*z*) = T**</span>. Applying a first iteration of the *Join* function to the ***&fnof;*(*x*, *y<sub>0</sub>*)** function, selecting ***x<sub>0</sub>* = F** and ***v<sub>0</sub>*(*x*) = *T*** to oblige to the hypotheses (note that in this case we couldn't select other values of *x* given the input function and the selected basis), we obtain: | *x* | *&fnof;<sup>2</sup><sub>1,2</sub>* = **H**(*&fnof;*(*x*, *y<sub>0</sub>*), *x<sub>0</sub>*, *v<sub>0</sub>*) | |:-:|:-:| | F | F | | T | T | We can see that *&fnof;<sup>2</sup><sub>1,2</sub>* &ne; F for some values of *x* and *y*, so the *&fnof;<sup>2</sup><sub>1,2</sub>*(*Z*) function is not the zero map. Therefore we'll need to apply a new round of recursion. 3) As we saw at the step 1, the *&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y*) function didn't generate the zero map, so a second iteration of the *Drill* function is necessary. We can therefore calculate *&fnof;<sup>1</sup><sub>2</sub>*(*x*, *y*) = **F**(*&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y*), *x<sub>1</sub>*, *y<sub>1</sub>*) such that *&fnof;<sup>1</sup><sub>1</sub>*(*x<sub>1</sub>*, *y<sub>1</sub>*) &ne; F. Querying the oracle (basically looking at the truth table of the *&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y*) function) we obtain that (*&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y*) &ne; F) &iff; (*x* = T, *y* = T). Hence we select ***x<sub>1</sub>* = T** and ***y<sub>1</sub>* = T** and obtain: > ***&fnof;<sup>1</sup><sub>2</sub>*(*x*, *y*)** > &rArr; **F**(*&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y*), *x<sub>1</sub>*, *y<sub>1</sub>*) > &rArr; *&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y*) &oplus; (*&fnof;<sup>1</sup><sub>1</sub>*(*x<sub>1</sub>*, *y*) &and; *&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y<sub>1</sub>*)) Resulting in two partial functions ***&fnof;<sup>1</sup><sub>1</sub>*(*x<sub>1</sub>*, *y*)** and ***&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y<sub>1</sub>*)** with truth table: | *x* | *y* | *&fnof;<sup>1</sup><sub>1</sub>*(*x<sub>1</sub>* = T, *y*) | *&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y<sub>1</sub>* = T) | |:-:|:-:|:-:|:-:| | F | F | F | F | | F | T | F | F | | T | F | F | F | | T | T | T | T | We should then check if ***&fnof;<sup>1</sup><sub>2</sub>*(*x*, *y*) = F**, to determine if *&fnof;<sup>1</sup><sub>2</sub>* is the zero map and if we should therefore stop the algorithm. | *x* | *y* | *&fnof;<sup>1</sup><sub>1</sub>*(*x<sub>1</sub>* = T, *y*) &and; *&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y<sub>1</sub>* = T) | *&fnof;<sup>1</sup><sub>2</sub>*(*x*, *y*) | |:-:|:-:|:-:|:-:| | F | F | F | F | | F | T | F | F | | T | F | F | F | | T | T | T | F | We can see that *&fnof;<sup>1</sup><sub>2</sub>*(*x*, *y*) = F for all the values of *x* and *y*, so we found the zero map and we can stop the recursion. 4. We can now apply a new round of the *Join* function on the not yet synthesized partial functions: ***&fnof;<sup>2</sup><sub>1,2</sub>***, ***&fnof;<sup>1</sup><sub>1</sub>*(*x<sub>1</sub>*, *y*)**, ***&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y<sub>1</sub>*)**. Applying a second iteration of the *Join* function to the ***&fnof;<sup>2</sup><sub>1,2</sub>*** function, selecting ***x<sub>1</sub>* = T** and ***v<sub>1</sub>*(*x*) = *x*** to oblige to the hypotheses (note that in this case we couldn't select other values of *x* given the input function and the selected basis), we obtain: | *x* | *&fnof;<sup>2</sup><sub>2,2</sub>* = H(*&fnof;<sup>2</sup><sub>1,2</sub>*, *x<sub>1</sub>*, *v<sub>1</sub>*) | |:-:|:-:| | F | F | | T | F | We can see that *&fnof;<sup>2</sup><sub>2,2</sub>* = F for all the values of *x* and *y*, so the *&fnof;<sup>2</sup><sub>2,2</sub>*(*Z*) function is the zero map and we can therefore stop the recursion and obtain the full rebuilt function (using the [*Join* rebuilding formula](#Join-rebuilding-formula)): <span style="color:#de8912">***&fnof;*(*x*, *y<sub>0</sub>*) = *v<sub>0</sub>*(*z*) &oplus; *v<sub>1</sub>*(*z*) = T &oplus; x**</span>. To handle the partial functions ***&fnof;<sup>1</sup><sub>1</sub>*(*x<sub>1</sub>*, *y*)** and ***&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y<sub>1</sub>*)** we can use the same basis chosen to handle the ***&fnof;*(*x<sub>0</sub>*, *y*)** and ***&fnof;*(*x*, *y<sub>0</sub>*)** functions. Applying a first iteration of the *Join* function to the ***&fnof;<sup>1</sup><sub>1</sub>*(*x<sub>1</sub>*, *y*)** function, selecting ***y<sub>0</sub>* = T** and ***v<sub>0</sub>*(*y*) = *y*** to oblige to the hypotheses (note that in this case we couldn't select other values of *y* given the input function and the selected basis), we obtain: | *y* | *&fnof;<sup>2</sup><sub>1,3</sub>* = H(*&fnof;<sup>1</sup><sub>1</sub>*(*x<sub>1</sub>*, *y*), *y<sub>0</sub>*, *v<sub>0</sub>*) | |:-:|:-:| | F | F | | T | F | We can see that *&fnof;<sup>2</sup><sub>1,3</sub>* = F for all the values of *y*, so the *&fnof;<sup>2</sup><sub>1,3</sub>*(*Z*) function is the zero map and we can therefore stop the recursion and obtain the full rebuilt function (using the [*Join* rebuilding formula](#Join-rebuilding-formula)): <span style="color:#de8912">***&fnof;<sup>1</sup><sub>1</sub>*(*x<sub>1</sub>*, *y*) = *v<sub>0</sub>*(*z*) = y**</span>. Applying a first iteration of the *Join* function to the ***&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y<sub>1</sub>*)** function, selecting ***x<sub>0</sub>* = T** and ***v<sub>0</sub>*(*x*) = *y*** to oblige to the hypotheses (note that in this case we couldn't select other values of *x* given the input function and the selected basis), we obtain: | *x* | *&fnof;<sup>2</sup><sub>1,4</sub>* = H(*&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y<sub>1</sub>*), *x<sub>0</sub>*, *v<sub>0</sub>*) | |:-:|:-:| | F | F | | T | F | We can see that *&fnof;<sup>2</sup><sub>1,4</sub>* = F for all the values of *x*, so the *&fnof;<sup>2</sup><sub>1,4</sub>*(*Z*) function is the zero map and we can therefore stop the recursion and obtain the full rebuilt function (using the [*Join* rebuilding formula](#Join-rebuilding-formula)): <span style="color:#de8912">***&fnof;<sup>1</sup><sub>1</sub>*(*x*, *y<sub>1</sub>*) = *v<sub>0</sub>*(*z*) = x**</span>. **Note**: In the first *Join* iteration step applied to the *&fnof;<sup>1</sup><sub>1</sub>*(*x<sub>1</sub>*, *y*) function we may have chosen *y<sub>0</sub>* = T and *v<sub>0</sub>* = T to oblige with the hypotheses, but at the second step we would have realised that it would have been impossible to choose a *y<sub>1</sub>* value to oblige to the hypothesis, meaning that the first selection of the generator vector (*v<sub>0</sub>*) was wrong. <span style="color:#fc0303">**@Rolf: It is not clear to me if there's an optimal way to select the generator vector from the basis.**</span> 5. Now that we have the full representation of all the partial problems we can build the original target function (using the [*Drill* rebuilding formula](#Drill-rebuilding-formula)): > ***&fnof;*(*x*, *y*)** > &rArr; (*&fnof;*(*x<sub>0</sub>*, *y*) &and; *&fnof;*(*x*, *y<sub>0</sub>*)) &oplus; (*&fnof;<sup>1</sup>*(*x<sub>1</sub>*, *y*) &and; *&fnof;<sup>1</sup>*(*x*, *y<sub>1</sub>*)) > &rArr; (*T* &and; (*T* &oplus; *x*)) &oplus; (*y* &and; *x*) > &rArr; (*T* &oplus; *x*) &oplus; (*y* &and; *x*) > &rArr; *T* &oplus; *x* &oplus; (*x* &and; *y*) We can verify that the obtained ANF representation matches with the manual expansion executed in the [Rewriting the function in ANF](#Rewriting-the-function-in-ANF) section, but this time we obtained it with the recursive application of the *Drill* and *Join* functions querying the oracle in a black-box fashion.