# JPEG Encoder Design
## 2D DCT-II
### 1D DCT-II
TODO: This needs to be updated so that we can reuse it for both the first and second-stage 1D DCT. (It will probably have to be parameterized over the size of the input.)
```verilog
module dct_1d
(input logic clk, rst, ena_in,
input logic [7:0] a_in,
output logic [11:0] S_out);
```
The input is in two's complement Q8.0 form (that is, just a regular signed integer from -128 to 127). The output is Q12.0.
The `dct_1d` module accepts rows (or columns) of 8 pixels, and computes an approximation of their DCT-II using the algorithm described in Algostini et al. 2001. Clock the rows in from left to right, and the DCT-II is computed with 48 cycles of latency. When `ena_in` is low, no new pixels are accepted, and no new coefficients are produced. You must provide entire rows (8 inputs where `ena_in` is high). It is pipelined: you can provide another row immediately after the first, and the computed rows of coefficients will be emitted in the same order, all 48 cycles later.

### Transpose buffer
The transpose buffer accepts 12-bit coefficients on at a time in row-major order, and 64 cycles (only ones where `ena_in` is high) later, they emerge in column-major order. This is pipelined: after the first 64 coefficients are shifted in, you can continue to shift in coefficients for the next block.
```verilog
module transpose_buffer
(input logic clk, rst, ena_in,
input logic [11:0] in,
output logic [11:0] out);
```