# Scheduling formulation
###### tags: `Thoughts`
<span style="font-size:0.8em">George's original formulation and my comments: [link](https://hackmd.io/@jcheng/SJrv8SajS)</span>
Let's consider a for-loop:
```
for n = 1 to N {
S_1;
S_2;
...
S_K;
}
```
**Definition 1:**
* The start time of a statement $S_k$ in iteration $n$ is $t_{n,k}$.
* A statement $S_k$ in iteration $n$ takes $L_{n,k}$ cycles to execute.
* A whole iteration $n$ takes $L_n$ cycles to execute.
* There are some dependences between statements: $D \subseteq N^4$: $(k1,k2,n1,n2) \in D$ <=> instance $n2$ of statement $k2$ depends on instance $n1$ of statement $k1$.
**Objective:** To determine all the $t_{n,k}$, that satisfying:
$\forall (k1,k2,n1,n2) \in D, t_{n2,k2} \geq t_{n1,k1} + L_{n1,k1}~~~~~~~(eq.1)$
### A. Modulo Scheduling
**Definition A1:**
We restrict start times to be:
$t_{n,k} = \alpha_{n,k} + Pn~~~~~~~(eq.2)$, where
$\alpha_k$ - some set of constants, one per statement,
$S_k$ - an initiation interval $P > 0$,
$n$ - loop iteration.
In $eq.1$, $n2 = f(n1)$, and $f$ can be arbitrary function. As a specialised case of the condition $eq.1$, we restrict ourselves to working with Here $f(x) = x + d$.
Assuming each statement takes constant cycles to execute ($L_n \perp \!\!\!\! \perp n$), the static scheduler allocate constant time slots of $L'$ cycles for each iteration to execute:
$L' = max\{L_n, n \in [0, N-1]\}~~~~~~~(eq.3)$
Then the total cycles become:
$T = P(N-1) + \alpha_k+L_k \approx PN$
This means that dependences $D' \subset N \times N \times N$:
$S_{k2}$ in iteration $n$ depends on the output of statement $S_{k1}$ in iteration $(n-d)$ only if $(k1,k2,d) \in D'$, *i.e.* we work with $D$ of the particular structure:
$D = \{ (k1,k2,n1,n2) | (k1,k2,n2-n1) \in D' \}$.
$t_{n,k2} \geq t_{n-d,k1} + L'~~~~~~~~~~~~~~~~(eq.4)$ by $eq.3$
$t_{n,k2} \geq t_{n-d,k1} + L'$ by $eq.2$
$\alpha_{n,k2} + Pn \geq \alpha_{n-d,k1} + P(n-d) + L'$
$\alpha_{n,k2} + Pd \geq \alpha_{n-d,k1} + L'$
$\alpha_{n,k2} \geq \alpha_{n-d,k1} + L' -Pd~~~~~~~(eq.5)$
> $eq.4$ and $eq.5$ are equivalent but I am not sure which one should be given...
### B. Dynamic Scheduling
We stick to $eq.1$:
> $\forall (k1,k2,n1,n2) \in D, t_{n2,k2} \geq t_{n1,k1} + L_{n1,k1}~~~~~~~(eq.1)$
Compared to Modulo Scheduling, it allows:
* $L_{n,k} \perp n$
* $D \neq D'$
but still assuming:
$\forall (k,n>0). t_{n,k} \geq t_{n-1,k} + 1$,
so the circuit is not unrolled but pipelined.
We can still use static scheduling for the dynamic case, so long as we over-approximate $D$ and $L$ by $D'$ and $L'$:
> $L' = max\{L_n, n \in [0, N-1]\}~~~~~~~(eq.3)$
$D' = \{ (k1, k2, n1-n2) | (k1,k2,n1,n2) \in D \}~~~~~~~(eq.6)$
**Conclusion:**
So the question is, **how much does this over-approximation hurt us?** I think this is a useful framework to think about the problem.
In particular, static analysis - probabilistic or otherwise - can help us understand the shape and values of $D$ and $L$.
### C. Remarks
> $\forall (k1,k2,n1,n2) \in D, t_{n2,k2} \geq t_{n1,k1} + L_{n1,k1}~~~~~~~(eq.1)$
>
> $L' = max\{L_n, n \in [0, N-1]\}~~~~~~~(eq.3)$
>
> $\forall (k1,k2,d) \in D', \alpha_{n,k2} \geq \alpha_{n-d,k1} + L' -Pd~~~~~~~(eq.5)$
>
> $D' = \{ (k1, k2, n1-n2) | (k1,k2,n1,n2) \in D \}~~~~~~~(eq.6)$
$eq.3$ and $eq.6$ are computed independently, *i.e.* we over-approximate $L$ and $D$ *separately* in typical HLS tools.
Just wondering aloud, but I wonder whether there's a case where analysis can directly do better than this due to correlation between $L_{n,k}$ and the structure of $D$, allowing us *still* to use the structure of $eq.5$ and normal scheduling, if we know that:
$max_{n1,k1} (L_{n1,k1} - P(n2-n1)) < max_{n1}L_{n1,k1} - Pmin_{n2,n1}(n2-n1)$.