---
# System prepended metadata

title: Swarm Bending

---


## Why Should Cartoons Have All the Fun?

How does one interact with a swarm? For a while now, I have tried to cast into existence a notion of **Swarm Bending**, inspired by the cartoon, Avatar - The Last Airbender. In this cartoon, characters can bend elements - air, earth, fire and water. Now, hear me out. What if, What if, these particles were little robots??

![image](https://hackmd.io/_uploads/HJC1b1j0be.png)


The idea is fairly straightforward and not new (but the name "swarm bending" is!). One has a hand gesture recognition system, and we plug it into dynamical system model for a swarm. The scientific goal would be study how to map swarm gestures to the parameters of the swarm model so that one gets effective performance at day-to-day tasks: like loading up your dishwasher using a swarm of of robots. An important consideration in this project would be the motion prescriptions for the swarm itself. Which properties of the model lead to better controllability or **Swarm Bendability**.

The-little-kid-inside you goal is to become an expert conjurer who treats swarm bending as an art form. *To feel the force, you have to create it.*

As I tried to probe into this idea, I realize this is not too hard to implement at a small scale. Luckily there are some readily available hand-recognition package in python. So implementation difficulty of the project is just a question of wills and penchant for tom-foolery. 


## Interaction Model


We have $M$ leader markers (from the hand tracker) at positions $z_1(t), \dots, z_M(t)$ and $N$ follower agents at positions $x_1(t), \dots, x_N(t)$. The followers evolve under first-order dynamics driven by two forces — inter-agent Morse interactions and leader coupling:

$$\dot{x}_i(t) = \underbrace{\sum_{j \neq i} F_{\text{morse}}(x_i - x_j)}_{\text{inter-agent}} + \underbrace{\alpha \sum_{m=1}^{M} F_{\text{leader}}(x_i - z_m)}_{\text{leader coupling}}$$

The inter-agent force derives from a **Morse potential** $U(r) = C_r e^{-l_r r} - C_a e^{-l_a r}$, giving:

$$F_{\text{morse}}(x_i - x_j) = \left( C_r l_r \, e^{-l_r r_{ij}} - C_a l_a \, e^{-l_a r_{ij}} \right) \frac{x_i - x_j}{r_{ij}}$$

where $r_{ij} = \|x_i - x_j\|$. Short-range repulsion prevents collisions while long-range attraction holds the swarm together.
The leader interaction takes the same exponential form:

$$F_{\text{leader}}(x_i - z_m) = \sigma \, C_l \, l_l \, e^{-l_l \|x_i - z_m\|} \frac{x_i - z_m}{\|x_i - z_m\|}$$

where $\sigma = +1$ for repulsion (agents flee the hand) and $\sigma = -1$ for attraction (agents chase the hand). The gain $\alpha$ controls how strongly the leaders influence the swarm relative to inter-agent forces.

The domain is $\Omega = [-1, 1]^2$ with reflective boundary conditions, and agent velocities are clamped at $v_{\max}$ to prevent discontinuous jumps.

The leader positions $z_m(t)$ are extracted in real time from the index fingertips and thumb tips of each detected hand via [MediaPipe's hand landmark model](https://ai.google.dev/edge/mediapipe/solutions/vision/hand_landmarker).

## Numerical Experiments

In the first experiment we use $4$ leaders and *I can't remember* how many followers.

The following shows the effect of repulsive interaction between leaders and followers

![swarm](https://hackmd.io/_uploads/rJr6zpOR-x.gif)

Next, we see the effect of an attractive interaction between leaders and followers. Turns out it feels much more malleable. 

![swarm_att](https://hackmd.io/_uploads/H1IPXpuC-l.gif)

## Comparison

Now that we have the setup, lets setup a task, and compare attraction and repulsion based methods.

Let $\rho^d(x)$ be a probability disitribution. Our goal is to use the finger pointers to control the swarm to get as close as possible to the target distribution.

### Gaussian

In this first test we start with a single Gaussian. We will use the [Wasserstein distance](https://en.wikipedia.org/wiki/Wasserstein_metric) between the final swarm position and the target to measure the success of the performance. 

![swarm_rep_gauss](https://hackmd.io/_uploads/r1feIic0bl.gif)


**Terminal Wasserstein distance: 0.07**

Easy peasy!

![swarm_att_att](https://hackmd.io/_uploads/Hk1oR950-g.gif)

**Terminal Wasserstein distance: 0.19**

### Two-Gaussian Mixture

![swarm_rep_att](https://hackmd.io/_uploads/SyfMzscAWe.gif)

**Terminal Wasserstein distance: 0.11**

![swarm_rep_rep](https://hackmd.io/_uploads/H1jY-iqR-l.gif)

**Terminal Wasserstein distance: 0.47**

So, *repulsive leaders have poorer controllability: duh
Limitations of the study: repulsive leaders are four of my fingers*

### The Famous Two Moons

Here, I was only brave enough to use the attractive leaders.

![swarm_rep_2moon](https://hackmd.io/_uploads/Bk1ftj5A-x.gif)

This is often a low dimensional benchmark for diffusion models. Maybe oneday our palms can compete with Dall-E.

## Linear Models: Lets take a Taylor Expansion of All our Problems

Now, we are going to consider a class of linear models. During the heyday of multi-agent control, the class of models that a big part of the control community studied was consenus models of the form,


$$\dot{x}_i(t) = \sum_{j \in \mathcal{N}(i)} -k \, (x_i - x_j)$$

where $\mathcal{N}(i)$ is the set of neighbors of agent $i$ in some communication graph. Each agent moves toward the average of its neighbors. The force is linear in distance. These models are elegant and analytically tractable. You can study their convergence rates using the graph Laplacian and controllability properties if one adds a leaders:


$$\dot{x}_i(t) = \sum_{j \in \mathcal{N}(i)} -k \, (x_i - x_j) + g \cdot \mathbf{1}_{[i \in S(m)]} \, (z_m - x_i)$$

where $S \subset \{1, \dots, N\}$ is the set of *informed* agents that feel the leader signal $z_m$, and $g$ is the leader gain. In our case, there are two leaders and $S(1) = \{1\}$ and $S(2) = N$. Hence, only the two endpoints of the chain are connected to the hand markers. This is a fundamentally **non-permutation-equivariant** model: agent 1 and agent $N$ play a special role, and swapping labels changes the dynamics. 

Given the feedback given to the user doesn't include identities, I think this lack of permutation-equivariance is fundamentally problematic for human swarm interaction. 

A lot was said about these systems about their controllability properties. For instance, the chain graph is **globally controllable** through leaders interacting through the end of the chain graphs. Lets see how it performs in the two Gaussian experiment.


![swarm_chain](https://hackmd.io/_uploads/S1gje0q0bx.gif)

**Terminal Wasserstein Distance: 0.5525**

Not very controllable in practice. Almost as bad as the repulsive interaction with nonlinearities. The problem is that controllability tells you whether you *can* steer the system to a state. *Not whether that state *persists* once you get there*. What matters for swarm bending (and for control systems in general) is whether the target configuration is an **equilibrium** *and* **can the system be controlled to the equilibrium**

For the chain graph, at equilibrium, $\dot{x}_i = 0$ for all $i$, which requires:

$$k(x_{i+1} - x_i) + k(x_{i-1} - x_i) = 0 \quad \Rightarrow \quad x_i = \frac{x_{i-1} + x_{i+1}}{2}$$

Every agent must sit at the midpoint of its neighbors. Its only solutions are **linear interpolations** between the boundary values $x_1$ and $x_N$. The equilibrium of a consensus chain is always a straight line between the two leaders.

A two-Gaussian target of two separated clusters with a gap in between is not a straight line. Booooo


## Things to Ponder Over 

The idea raises interesting questions.

* Are nonlinear models of interaction more bendable than linear models? 
* What is the role of permutation invariance? 
* Are continuum models better in some way?

This is a live experiment. If interested, keep looking at my notes for updates.

--------------------------


[MediaPipe Hand Landmarker — Python Guide](https://ai.google.dev/edge/mediapipe/solutions/vision/hand_landmarker/python)