# George's formulation 15112019
###### tags: `Thoughts`
Let's consider a for-loop:
```
for n = 1 to N {
S_1;
S_2;
...
S_K;
}
```
* Our job is to determine a **start time** $t_{n,k}$ for every instance of every statement.
* A instance $n$ of statement $k$ takes $L_{n,k}$ cycles to execute.
* There are some **dependences** between statements: $D \subseteq N^4$: $(k1,k2,n1,n2) \in D$ <=> instance $n2$ of statement $k2$ depends on instance $n1$ of statement $k1$.
A feasible schedule is therefore one satisfying:
$\forall (k1,k2,n1,n2) \in D, t_{n2,k2} \geq t_{n1,k1} + L_{n1,k1}$ (***)
### A. MODULO SCHEDULING / LOOP PIPELINING
In normal loop pipelining/modulo scheduling we restrict start times to be:
$t_{n,k} = \alpha_k + Pn$ (*)
for some set of constants $\alpha_k$ - one per statement, $S_k$ - an initiation interval $P > 0$, and iteration $n$.
We say that each statement takes $L_k$ cycles to execute (independent of $n$), which means the total execution time of the loop is $max_{k,n} (t_{n,k} + L_k) = P(N-1) + max_k (\alpha_k + L_k) \approx PN$.
In Modulo scheduling, we restrict ourselves to working with dependences $D' \subset N \times N \times N$:
$S_{k2}$ in iteration $n$ depends on the output of statement $S_{k1}$ in iteration $(n-d)$ only if $(k1,k2,d) \in D'$, *i.e.* we work with $D$ of the particular structure $D = \{ (k1,k2,n1,n2) | (k1,k2,n2-n1) \in D' \}$.
We do this by solving the constraints $\forall (k1,k2,d) \in D'. \alpha_{k2} \geq \alpha_{k1} + L - Pd$ (**)
We can easily see that this constraint meets the dependences $D'$ (as we worked out on the whiteboard):
Substituting $n$ and $n-d$ into (*) and then substituting into (**) gives:
$t_{n,k2} \geq t_{n-d,k1} + L_k - Pd$
$\Longleftrightarrow t_{n,k2} - Pn \geq t_{n-d,k1} - P(n-d) - Pd + L$
$\Longleftrightarrow t_{n,k2} \geq t_{n-d,k1} + L$, precisely satisfying the dependences.
The key innovation here is to replace this latter set of constraints which grows in number with N, by a single constraint for each dependence.
### B. THE DYNAMIC CASE
We still, in general, want to determine schedules satisfying $D$, but now it's the general case (***) that we care about.
This lets us
1. have $L_{n,k}$ that depends on $n$.
2. have a structure of $D$ that is different to that given by the construction in terms of $D'$ above.
*BUT*
I think we need to carefully define the limits of the flexibility we're working with. For example, I think we are still **assuming** that
$\forall (k,n>0). t_{n,k} \geq t_{n-1,k} + 1$, *i.e.* that loop iterations still start sequentially, even in the absence of dependence.
Are there any other constraints on the solution that the dynamic case imposes?
We can still use static scheduling for the dynamic case, so long as we over-approximate $D$ and $L$ by $D'$ and $L'$:
$L'_k \geq \max_n L_{n,k}$ (****)
$D' = \{ (k1, k2, n1-n2) | (k1,k2,n1,n2) \in D \}$ (*****)
So the question is, **how much does this over-approximation hurt us?** I think this is a useful framework to think about the problem.
In particular, static analysis - probabilistic or otherwise - can help us understand the shape and values of $D$ and $L$.
### C. A (MINOR?) OBSERVATION
$\forall (k1,k2,d) \in D'. \alpha_{k2} \geq \alpha_{k1} + L - Pd$ (**)
$\forall (k1,k2,n1,n2) \in D, t_{n2,k2} \geq t_{n1,k1} + L_{n1,k1}$ (***)
$L'_k \geq \max_n L_{n,k}$ (****)
As I was writing this, I notice that (\*\*\*\*) and (\*\*\*\*\*) are computed independently, *i.e.* we over-approximate $L$ and $D$ *separately* in typical HLS tools.
Just wondering aloud, but I wonder whether there's a case where analysis can directly do better than this due to correlation between $L_{n,k}$ and the structure of $D$, allowing us *still* to use the structure of (**) and normal scheduling, if we know that $max_{n1,k1} (L_{n1,k1} - P(n2-n1)) < max_{n1}L_{n1,k1} - Pmin_{n2,n1}(n2-n1)$.
Just a thought... prob not obvious what I mean but I can explain on Monday.