# Formulating the Optimal Subgraph as a Continuous Optimization Problem :octopus: --- ## A Simple Example Suppose we have users who have given money to various projecs. The contributions are indicated in the labeled bipartite graph below. ![](https://i.imgur.com/RbgKiBd.jpg) Under the original LR algorithm, assuming an unlimited matching funds pool, we have **Project A:** receives $\left( \sqrt{1.0} + \sqrt{1.0} + \sqrt{0.25} \right)^2 \approx 6.25$ units **Project B:** receives $\left( \sqrt{0.75} \right)^2 \approx 0.75$ units In this very simple scenario, it isn't hard to see that if there are no other restrictions, the best **Project A** could do would be if all three users each gave their full budget of 1.0 units, in which case **A** would receive $\left( \sqrt{1.0} + \sqrt{1.0} + \sqrt{1.0} \right)^2 = 9$ units. If Project A is greedy, they might feel they are underperforming and covet B's funds. From this perspective, A might calculate $$1 - \displaystyle\frac{6.25}{9} \approx 0.305$ or that this allocation is 31% suboptimal. --- ## Reformulating Mathematically In the real world, the problem of finding optimizing this problem (i.e. how well **A** could have done) is more complicated. It's hard to know where to start. Perhaps we can start by solving what we do undestand well. We'd like to formulate the simple example above in a way that makes the mathematical structure clear. This mathematical structure can then grow to solve larger problems. For each $i \in \{1,2,3\}$, let's write each contribution from user $i$ to **A** as $a_i$, and do the same thing for **Project B.** This means we are thinking of our contributions as living in a complete bipartite graph with 6 edges, it's just that some of the edges are 0. This means that the edge-allocations can be viewed as a tuple $(a_1, a_2, a_3, b_1, b_2, b_3)$ which lives in $\mathbb{R}^6$. The amount that $A$ receives based on a particular set of allocations is a function $M: \mathbb{R}^6 \rightarrow \mathbb{R}$. We can formulate this problem as: * Find the maximum value of the function $M = \left( \sqrt{a_{1}} + \sqrt{a_2} + \sqrt{a_3} \right)^2$ Define the contribution power $\sqrt{a_i}$ = $x_i$ (:octopus: **Note:** M for *matching*, also for *maximize*) * with the **constraints** that * $a_1 + a_2 + a_3 + b_1 + b_2 + b_3 = 3.0$ (total contributions are preserved) * This implies that $x_1^2$ + $x_2^2$ + $x_3^2$ + $y_1^2$ + $y_2^2$ + $y_3^2$ = 3.0 * All variables are non-negative and do not exceed total contributions, i.e. $0 \leq a_i \leq 3.0$ and $0 \leq b_i \leq 3.0$ for $i = 1,2,3$. This can be viewed as maximizing a function $M: \mathbb{R}^6 \rightarrow \mathbb{R}$ over a certain geometric shape. The shape would be fairly straightforward to describe in a smaller number of dimensions. It may be helpful to draw these examples, or have an expert convex geometer describe them. **The assumptions are incredibly important. Let's state them explicitly.** We are allowing... 1. new edges to be created (i.e. 0-weighted edges to become non-0-weighted), 2. old edges to be destroyed (i.e. non-0-weighted become 0-weighted), 3. contribution funds to move from one user to another (which may make sense, since from a cryptocurrency perspective we do not even know if **User 1** and **User 2** are distinct). In other words, we are assuming that each of the following alternative scenarios is allowed. ![](https://i.imgur.com/6P4yLTA.jpg) ![](https://i.imgur.com/211yfEp.jpg) ![](https://i.imgur.com/kimF8zt.jpg) In the problem above, we are only assuming two things: 1. The total amount of all contributions in the entire ecosystem is preserved 2. There is at most one (edge) contribution between each user and each grant (or, with 0.0-weighted edges, exactly one edge between each user and each grant. ) --- ## The Mathematics **Under the formulation above (which relies on the assumptions listed)**, there are many techniques for approaching the problem we have described above. * **Basic Multivariable Calculus** : For certain cases, we could just use partial derivatives in a Lagrange Multiplier formulation (especially if we use the Quadratic Formulation below). This has the advantage of being doable by hand and could potentially yield proofs for certain cases. It has the disadvantage of being slow and requiring good Calculus skills (or access to something like Mathematica or Maple). * **Convex Optimization** -- both general theoretical methods (as described in the [textbook by Boyd and Vandenberghe](https://web.stanford.edu/~boyd/cvxbook/)), as well as the [dedicated Python library cvxpy](https://www.cvxpy.org/). This has the advantage of utilizing powerful tools that are well-understood, but is accessible to a narrower audience. * **Gradient Descent** - a numerical method used in similar optimization problems in Machine Learning, scales well with a large number of variables (in the thousands is OK). Convergence is well-understood. This has the advantage of using powerful tools that are well-understood (especially as the ML community grows), but will be approximate rather than exact. * **Quadratic Formulation** - if we make the change of variables $c_i = \sqrt{a_i}, d_i = \sqrt{b_i}$, then the problem can alternatively be formulated as: * Maximize the function $M = (c_1 + c_2 + c_3)^2$ with constraints * $c_1^2 + c_2^2 + c_3^2 + d_1^2 + d_2^2 + d_3^2 = 3.0$ * $0.0 \leq c_1^2 \leq 3.0$, $0.0 \leq d_1^2 \leq 3.0$, etc. The advantage here is that it is easier to square numerically than it is to take the square root. Now we have something which looks like a bunch of quadratic polynomials everywhere, allowing us to potentially take advantage of [Quadratic Programming](https://en.wikipedia.org/wiki/Quadratic_programming) (although Quadratic Programming requires a linear objective function, our objective function here will take its maximum whenever the linear input $c_1 + c_2 + c_3$ does). --- ### A Slightly Harder Problem Imagine now that instead of just considering the funds allocated to Project A, we consider **the total sum allocated to project A and project B together**. In other words, we might imagine that Users 1,2, and 3 are strategically working together to maximize the total amount that is matched. Now our function to optimize is $M_2 = \left( \sqrt{a_1} + \sqrt{a_2} + \sqrt{a_3} \right^2 + \left( \sqrt{b_1} + \sqrt{b_2} + \sqrt{b_3} \right)^2 with the same constraints as the "A-only" function M. *A first guess*: It seems reasonable that our users might just split their individual budgets between the two projects, contributing 0.5 units to each one. In that case, we would have $$M_2 = 2 \left(\sqrt{0.5} + \sqrt{0.5} + \sqrt{0.5} \right)^2 = 2 (3 \sqrt{0.5})^2 = 9$$. **A counter-intuitive observation:** The total amount received by **Project A and B together** under the basic LR algorithm is the same under both scenarios * All users give their entire budget to **A** * All users split their budgets equally between **A** and **B**. In fact, there is nothing magical about the number 0.5. Let's prove that. --- **Theorem 1:** Suppose that there are $n$ users, each of whom has a budget of 1.0 units. They are interested in obtaining the maximum total amount for A and B together, i.e. optimizing the function $$M_g = \left(\displaystyle\sum_{i=1}^n \sqrt{a_i}\right)^2 + \left( \displaystyle\sum_{i=1}^n \sqrt{b_i}\right)^2$$ with constraints $0 \leq a_i, b_i \leq 1$ for $1 \leq i \leq n$. For any $t \in [0, 1]$, if each user allocates $t$ to $A$ and $1-t$ to $B$, then the value of $M_g$ is the same. In other words, *any uniform strategy between users achieves the same total matching*. **Proof:** Supposing that each user gives $t$ to **A** and $1-t$ to **B**, the value of $M_g$ is equal to $$\left( \displaystyle\sum_{i=1}^n (\sqrt{t})\right^2 + \left(\displaystyle\sum_{i=1}^n \sqrt{b_i} \right)^2$$ which we simplify to $$\left( n \sqrt{t} \right)^2 + \left( n \sqrt{1-n} \right)^2$$ and further simplify to $$n^2 t + n^2(1-t) = n^2.$$ Since this constant value of $n^2$ is independent of $t$, this completes the proof. --- ## Personal Conclusions :octopus: I need to wrap up for the time being, but I'm leaving these thoughts. 1. We have not proven that $n^2$ is optimal. We save that for another day. 2. I:octopus: am happy with the Theorem, however small it is. It gives me my first geometric insight into the structure of this optimization space: *there exists a line segment of constant value*. 3. I:octopus: suspect that the Theorem above will generalize to $n$ users and $k$ grants, with affine combinations instead of convex ones. 4. The constraint that each user has 1.0 units can be generalized -- allowing arbitrary budgets is just an affine tranformation. 5. If these things are well-known or obvious to the community, please let me know. I just want to contribute something useful. ### Summary We are describing the problem of optimizing a utility function on a bipartite graph using continuous methods. To do this, we interpret a weighting of the edges of a complete bipartite graph as an element of $\mathbb{R}^{nm}$, where $n$ is the number of users and $m$ of projects to which they could contribute. Our purpose is threefold: - to clearly illustrate the problem for non-mathematicians (though a simplified version that needs to be refined before actually yielding insight on the CLR algorithm currently in use) - to make explicit **our** (:octopus:) assumptions and make sure they are compatible with how other interested researchers view the problem - to list possible methods for collaborators to use in future, including theorems and proofs where those yield insight. Obviously, the Gitcoin Grants CLR function has attributes extending beyond the basic one we describe here. Still, we believe that solving this simplified problem is an approachable start that will still yield insight to the more complex real-world versions. --- ![](https://i.imgur.com/rCKU7aR.png)