# SoC Lab Workload-optimized SoC Source code: [GitHub <i class="fa fa-external-link"></i>](https://github.com/dqrengg/SoC_Laboratory/tree/main/Lab-wlos) ## Overview * HW/SW co-design to optimize SoC performance and workload under specific task assignments * Implement [FIR <i class="fa fa-external-link"></i>](/rkfLNCYoye) accelerator, AXI bus, DMA, arbiter, and SDRAM controller ## Block Diagram ![wlos_block](https://hackmd.io/_uploads/H13yGzymZl.png) ## SDRAM Controller * Originally, the SDRAM controller has only a basic FSM to match the SDRAM timing. * To improve memory bandwidth, the controller is re-designed, and supports **burst read** and **bank interleaving** features. ### Controller Design ### Address Mapping Optimization for Interleaving * Code and data locating in actual memory ![mmap](https://hackmd.io/_uploads/H1r0-L3M-l.png) * Memory address mapping for DRAM bank, row and column ![address_mapping](https://hackmd.io/_uploads/S1jgMyvzZg.png) * bit 0-1: `column[1:0]`. The momory is word addressable, so the bits are ignored. * bit 2-3: `column[3:2]`. The momory is set to burst read mode, so these 2 bits determine the read order. * bit 4: `bank[0]`. Bank interleaving every 4 words (= burst length) * bit 5-8: `column[7:4]` * bit 9-13: `row[4:0]` * bit 14: `bank[14]`. For code/data interleaving. Defined in firmware linker. * bit 15-22: `row[12:5]`. The bits are unused due to SDRAM capacity. * During firmware compilation, the linker file allocate code (`mprjram`) and data (`dataram`) segments in different banks. ```c // file name: ./firmware/section.lds // line 11 to 21 MEMORY { // ... mprj : ORIGIN = 0x30000000, LENGTH = 0x00100000 mprjram : ORIGIN = 0x38000000, LENGTH = 0x00004000 dataram : ORIGIN = 0x38004000, LENGTH = 0x00004000 // ... } ``` The global variables are assigned to certain memory section. ```c // file name: ./testbench/wlos/fir.c // line 5 to 27 int taps[16] __attribute__((section(".dataram"))) = { /* ... */ }; int X[N] __attribute__((section(".dataram"))) = { /* ... */ }; int Y[N] __attribute__((section(".dataram"))); ``` * In addition, the length of `tap` array is extended to 16 and the unused spaces are filled with 0s, aligning the following array address starting at multiples of 4 to take the advantage of burst read. ```c // file name: ./testbench/wlos/fir.c // line 5 to 8 int taps[16] __attribute__((section(".dataram"))) = { 0, -10, -9, 23, 56, 63, 56, 23, -9, -10, 0, /* 11-15 unused, filled with 0 */ 0, 0, 0, 0, 0 }; ``` ## DMA * The DMA directly transits data between FIR and memory, and therefore offloads CPU. * Workflow without and with DMA ![dma_workflow](https://hackmd.io/_uploads/H1Lfzf1Xbg.png =70%x) * Data stream without and with DMA ![data_stream](https://hackmd.io/_uploads/BkFwGzkXZg.png =70%x) ## Future Improment