7: Reachability, transitive closure, graphs

Reachability, transitive closure, graphs --- # More on directed graphs, cycles, and reachability We'll continue with the basic, directed graph model we started with last time: ```java sig Node { edges : set Node } run { #Node = 3} ``` When I run this, I get the same instance in the evaluator. Recall that `edges` is a relation that has an element for every edge. We can think of this as the set of all 1-hop paths. The instance we get first is: ``[N1] -> [N0] -> [N3], (N1 also has a self loop)`` The following results are all from the Evaluator in the Alloy Analyzer. --- If we ask for the join of the `edges` relation with itself: `edges.edges` <details> <summary>We get:</summary> The set of 2-hop paths (`{N1->N0, N1->N1, N1->N2} `). </details> --- `edges.edges.edges` <details> <summary>We get:</summary> The set of 3-hop paths (`{N1->N0, N1->N1, N1->N2} `). </details> --- `iden` (A built-in relation keyword in Alloy) <details> <summary>We get:</summary> A relation from every atom to itself (`{N0->N0, N1->N1, N2->N2}`). </details> --- `edges - iden` <details> <summary>We get:</summary> The graph with self-loops removed cycles(`{N0->N2, N1->N0} `). </details> --- ## Ruling out self-loops We could add this as a constraint. As an exercise, we wrote on the whiteboard several different, equivalent statements for ruling out self loops. Each line here is redundant/equivalent with every other line: ```java pred noSelfLoopsPredicate { -- no N1->N1 etc. all n: Node | n->n not in edges all n: Node | n not in n.edges no n: Node | n in n.edges } ``` ```java pred noSelfLoopsRelational { no (iden & edges) edges = edges - edges } ``` We can use Alloy to check that these are the same (up to some small scope): ```java assert same { noSelfLoopsPredicate iff noSelfLoopsRelational } check same for 10 Node ``` --- Let's continue on with the idea of removing loops. We can specify the set of 2-hop paths that do not include self-loops as: `(edges - iden).(edges - iden)` <details> <summary>We get:</summary> 2-hop paths not including self-loops: `{N1->N2}` </details> --- ## Joins We discussed in more detail how to think about the `.` join relational operator. We can conceptualize joins are taking two database tables, finding all the rows were the columns in the middle of the join are the same, and keeping **just those rows** with the "matched" columns removed. Said another way, a join `a.b` on two binary relations, `a` and `b`: - Looks at the second atom, `a2`, of every pair `a1->a2`in `a`. - Finds every pair `b1->b2` in `b` where the first atom `b1` matches `a2`. - Adds `a1->b2` to the output relation. In math notation, the join of binary relations $S$ and $R$ is: $$ R.S = \{ (a, c) \mid \exists b \in B, (a, b) \in R \text{ and } (b, c) \in S \}$$ ### Arity, which joins are well-defined :::info The number of columns in a relation is called its **arity** (e.g., in Alloy, a binary relation has arity 2, a set has arity 1). ::: Since joins remove the "matched" column, the join of a relation with arity `m` and a relation with arity `n` has arity `m + n - 2`. --- :::danger If we try to join a *set* with itself: `Node.Node` We get a type error. This is because it would result in a zero column table: sets have arity 1, and 1 + 1 - 2 = 0. Zero **row** tables are fine (they represent the empty set), zero **column** tables not defined. ::: Going back to path lengths: write down what expression I should write to get the relation for 4-hop paths (from start to end) within this graph. `edges.edges.edges.edges` That's getting tedious! To remove loops of length `n`, we're having to write the join of the relation with itself `n-1` many times. What we are trying to use, though, has a specific name. It's related to the property of transitivity that we've seen a few types. # Transitive closure :::info Alloy has a built-in operator that conceptually represents repeated joins of a relation and itself. The *transitive closure* of a binary relation is the smallest relation that encompasses the relation and satisfies transitivity. It's conceptually the same as repeatedly joining the relation with itself until the relation stops growing. In Alloy, the `^` operator gives the transitive closure of a relation. That is, for a binary relation `r`, `^r` is the smallest relation that satisfies transitivity and contains `r`. ::: We can think of *finding* the transitive closure by taking any 2-hop path in the graph, and drawing a new edge between the start and end if the edge was not already there. We repeat this process until no more edges can be added this way. --- This is on example of a running an algorithm to a [fixed point](https://en.wikipedia.org/wiki/Fixed_point_(mathematics)). A fixed point, in computer science, is informally the point at which applying a loop iteration within an algorithm no longer changes the output. The transitive closure if the fixed-point of taking the join of the relation with itself. --- ## Reachability of a specific node Now, we can use the transitive closure to think about **all possible paths** in a graph: `^edges` is the set of *all* paths in a graph. The transitive closure is useful to express the “reachability” within a graph. Let's imagine that nodes are servers, and `CSLinux` is one specific server. :::success How could I model this in Alloy? ::: <details> <summary>(Think, then click!)</summary> We want a special Node type to be a CSLinux server, so I will use `extends`. We want there to be just one, so we'll use `one`. ```java one sig CSLinux extends Node {} ``` </details> We can now express the set of nodes *reachable* from `CSLinux`: <details> <summary>(Think, then click!)</summary> ```java CSLinux.^edges ``` </details> And the set of nodes `CSLinux` is *reachable* from: <details> <summary>(Think, then click!)</summary> ```java ^edges.CSLinux ``` </details> --- Going back to our example *without* `CSLinux`. If we wanted to eliminate all loops from our graph in this example, we could then use: ```java no ^edges & iden ``` Which constraints the instance such that no node is “reachable” from itself. This is equivalent to: ```java no n: Node | n in n.^edges ``` # Undirected graphs --- :::success How can we model an *undirected* graph in Alloy? ::: Relations have order, but we can include a symmetric constraint to represent that the order does not matter (a pair is either in the relation `edges` or not, regardless of order). ```java fact undirectedPredicate { all x, y: Node | x->y in edges implies y->x in edges } // same as: edges = ~edge fact undirectedRelational { edges = ~edge } ``` Undirected graphs are useful for many systems programming tasks (server connections, etc). A downside to this model of undirected graphs: :::success Can we still use `^edges` to talk about reachability? ::: <details> <summary>(Think, then click!)</summary> Yes, in the sense that disconnected graph components will still be disconnected. No, in the sense that every "edge" is essentially a small loop! </details> Solving the "is there a cycle" problem is harder for undirected graphs, and this is part of what we will touch on in Workshop this week for the next assignment. Our solution will rely on some properties of graphs vs trees. :::success What is the definition of an *undirected* tree? ::: <details> <summary>(Think, then click!)</summary> Is it just a graph with no cycles? No, it also needs to be connected. An undirected tree is an acyclic and connected undirected graph. </details> ## Graphs as Trees An undirected graph `G` is a tree if and only if it satisfies any of the following 5 conditions: 1. `G` is connected, but removing any edge disconnects it. 1. `G` is acyclic, but adding any edge between nodes that is not currently in the `G` creates a cycle. 1. Exactly one path connects each pair of nodes in `G`. 1. `G` has `#Node-1` edges and no cycles (avoid in Alloy). 1. `G` has `#Node-1` edges and is connected (avoid in Alloy). We'll avoid the final two definitions in Alloy, since integer operations and counting in Alloy are much more expensive than expressing the other properties.