Scheduling formulation

# Scheduling formulation ###### tags: `Thoughts` <span style="font-size:0.8em">George's original formulation and my comments: [link](https://hackmd.io/@jcheng/SJrv8SajS)</span> Let's consider a for-loop: ``` for n = 1 to N { S_1; S_2; ... S_K; } ``` **Definition 1:** * The start time of a statement $S_k$ in iteration $n$ is $t_{n,k}$. * A statement $S_k$ in iteration $n$ takes $L_{n,k}$ cycles to execute. * A whole iteration $n$ takes $L_n$ cycles to execute. * There are some dependences between statements: $D \subseteq N^4$: $(k1,k2,n1,n2) \in D$ <=> instance $n2$ of statement $k2$ depends on instance $n1$ of statement $k1$. **Objective:** To determine all the $t_{n,k}$, that satisfying: $\forall (k1,k2,n1,n2) \in D, t_{n2,k2} \geq t_{n1,k1} + L_{n1,k1}~~~~~~~(eq.1)$ ### A. Modulo Scheduling **Definition A1:** We restrict start times to be: $t_{n,k} = \alpha_{n,k} + Pn~~~~~~~(eq.2)$, where $\alpha_k$ - some set of constants, one per statement, $S_k$ - an initiation interval $P > 0$, $n$ - loop iteration. In $eq.1$, $n2 = f(n1)$, and $f$ can be arbitrary function. As a specialised case of the condition $eq.1$, we restrict ourselves to working with Here $f(x) = x + d$. Assuming each statement takes constant cycles to execute ($L_n \perp \!\!\!\! \perp n$), the static scheduler allocate constant time slots of $L'$ cycles for each iteration to execute: $L' = max\{L_n, n \in [0, N-1]\}~~~~~~~(eq.3)$ Then the total cycles become: $T = P(N-1) + \alpha_k+L_k \approx PN$ This means that dependences $D' \subset N \times N \times N$: $S_{k2}$ in iteration $n$ depends on the output of statement $S_{k1}$ in iteration $(n-d)$ only if $(k1,k2,d) \in D'$, *i.e.* we work with $D$ of the particular structure: $D = \{ (k1,k2,n1,n2) | (k1,k2,n2-n1) \in D' \}$. $t_{n,k2} \geq t_{n-d,k1} + L'~~~~~~~~~~~~~~~~(eq.4)$ by $eq.3$ $t_{n,k2} \geq t_{n-d,k1} + L'$ by $eq.2$ $\alpha_{n,k2} + Pn \geq \alpha_{n-d,k1} + P(n-d) + L'$ $\alpha_{n,k2} + Pd \geq \alpha_{n-d,k1} + L'$ $\alpha_{n,k2} \geq \alpha_{n-d,k1} + L' -Pd~~~~~~~(eq.5)$ > $eq.4$ and $eq.5$ are equivalent but I am not sure which one should be given... ### B. Dynamic Scheduling We stick to $eq.1$: > $\forall (k1,k2,n1,n2) \in D, t_{n2,k2} \geq t_{n1,k1} + L_{n1,k1}~~~~~~~(eq.1)$ Compared to Modulo Scheduling, it allows: * $L_{n,k} \perp n$ * $D \neq D'$ but still assuming: $\forall (k,n>0). t_{n,k} \geq t_{n-1,k} + 1$, so the circuit is not unrolled but pipelined. We can still use static scheduling for the dynamic case, so long as we over-approximate $D$ and $L$ by $D'$ and $L'$: > $L' = max\{L_n, n \in [0, N-1]\}~~~~~~~(eq.3)$ $D' = \{ (k1, k2, n1-n2) | (k1,k2,n1,n2) \in D \}~~~~~~~(eq.6)$ **Conclusion:** So the question is, **how much does this over-approximation hurt us?** I think this is a useful framework to think about the problem. In particular, static analysis - probabilistic or otherwise - can help us understand the shape and values of $D$ and $L$. ### C. Remarks > $\forall (k1,k2,n1,n2) \in D, t_{n2,k2} \geq t_{n1,k1} + L_{n1,k1}~~~~~~~(eq.1)$ > > $L' = max\{L_n, n \in [0, N-1]\}~~~~~~~(eq.3)$ > > $\forall (k1,k2,d) \in D', \alpha_{n,k2} \geq \alpha_{n-d,k1} + L' -Pd~~~~~~~(eq.5)$ > > $D' = \{ (k1, k2, n1-n2) | (k1,k2,n1,n2) \in D \}~~~~~~~(eq.6)$ $eq.3$ and $eq.6$ are computed independently, *i.e.* we over-approximate $L$ and $D$ *separately* in typical HLS tools. Just wondering aloud, but I wonder whether there's a case where analysis can directly do better than this due to correlation between $L_{n,k}$ and the structure of $D$, allowing us *still* to use the structure of $eq.5$ and normal scheduling, if we know that: $max_{n1,k1} (L_{n1,k1} - P(n2-n1)) < max_{n1}L_{n1,k1} - Pmin_{n2,n1}(n2-n1)$.