# AES Implementation in Chisel > 陳侯華 ## Setting Up the macOS Environment ### 1. Install Essential Tools First, install Verilator using Homebrew: ```bash brew install verilator gtkwave ``` ### 2. Install sbt (Scala Build Tool) `sbt` is a build tool for Scala. Before installation, ensure your system has a working **JVM (Java Virtual Machine)** installed. 1. Clean up existing installations First, remove any old versions of sbt and jenv (if previously installed): ```bash brew uninstall sbt brew uninstall jenv ``` 2. Install SDKMAN SDKMAN is a tool for managing parallel versions of multiple Software Development Kits: ```bash curl -s "https://get.sdkman.io" | bash source "$HOME/.sdkman/bin/sdkman-init.sh" ``` 3. Install Java and sbt Install Eclipse Temurin JDK 11 and sbt using SDKMAN: ```bash sdk install java 11.0.21-tem sdk install sbt ``` ### 3. Fix GTKWave Compatibility Issues GTKWave has known compatibility issues with macOS 14 (Sonoma). To resolve this, uninstall the current version and install the HEAD version from randomplum's tap: ```bash brew uninstall gtkwave brew untap randomplum/gtkwave brew install --HEAD randomplum/gtkwave/gtkwave ``` This should resolve the compatibility issues and allow GTKWave to run properly on macOS 14. ## Principles of AES Encryption AES (Advanced Encryption Standard) is a symmetric encryption algorithm, meaning the same key is used for both encryption and decryption. It offers different levels of security depending on the key length (128, 192, or 256 bits). Below is an explanation of its core principles and operation process: ### Basic Concepts 1. Symmetric Encryption - The same key is used for both encryption and decryption. - The security of the algorithm depends on keeping the key secret. 2. Block Cipher - AES processes fixed-size data blocks (128 bits = 16 bytes). - If the input data exceeds the block size, **padding** is applied, or the data is divided into multiple blocks for encryption. 3. Rounds of Encryption - AES performs 10, 12, or 14 rounds of operations depending on the key length, with each round consisting of multiple steps. ### Encryption Process ![image](https://hackmd.io/_uploads/H1_12XzvJx.png) Each block is processed sequentially following these steps. Once all blocks are successfully encrypted, they are concatenated to form the final ciphertext. The steps are as follows: :::danger Fix permissions of uploaded files. ::: **1. Initial Operation: AddRoundKey** - AddRoundKey: XORs the plaintext with the first round key. ![image](https://hackmd.io/_uploads/HyNYC7YUkx.png) **2. Round Operations: 10, 12, or 14 rounds (depending on key length)** Each round consists of the following four steps: **Step 1: SubBytes** - Each byte in the state matrix is substituted using a fixed S-Box (substitution box). - The S-Box is derived from **non-linear transformations** and **finite field inversion**, increasing resistance to attacks. ![image](https://hackmd.io/_uploads/ByhIGEF8yx.png) **Step 2: ShiftRows** - The rows of the state matrix are shifted as follows: - Row 0: No shift. - Row 1: Left shift by 1 byte. - Row 2: Left shift by 2 bytes. - Row 3: Left shift by 3 bytes. - This step scrambles the byte positions, enhancing diffusion. ![image](https://hackmd.io/_uploads/H1szQNFUyg.png) **Step 3: MixColumns** - Each column of the state matrix is treated as a polynomial and multiplied with a fixed polynomial. - This operation spreads the data's influence across all bytes in the column. ![image](https://hackmd.io/_uploads/BkNifNYL1x.png) **Step 4: AddRoundKey** - The current state matrix is XORed with the corresponding round key. ![image](https://hackmd.io/_uploads/BkcCG4tUyg.png) **3. Final Round** - Repeat Steps 1, 2, and 4 (**MixColumns is skipped**). - The result is the ciphertext. ### Decryption Process ![image](https://hackmd.io/_uploads/Hk0ajXzvkg.png) The AES decryption process reverses the encryption steps, performing the following operations: 1. InvShiftRows 2. InvSubBytes 3. AddRoundKey 4. InvMixColumns Unlike encryption, the round key usage sequence is reversed during decryption. ## Two Modes in the AES The block ciphers are schemes for encryption or decryption where a block of plaintext is treated as a single block and is used to obtain a block of ciphertext with the same size. The size of an AES block is 128 bits, whereas the size of the encryption key can be 128, 192 or 256 bits. Please note this, there is three length in the key, but the size of the encryption block always is 128 bits. Block cipher algorithms should enable encryption of the plaintext with size which is different from the defined size of one block as well. We can use some algorithms for padding block when the plaintext is not enough a block, like PKCS5 or PKCS7. ### ECB mode: Electronic Code Book mode ![image](https://hackmd.io/_uploads/rJn44EMD1l.png) In ECB mode, each plaintext block is encrypted independently, which allows for high parallelism but comes with lower security since identical plaintext blocks result in identical ciphertext blocks. This mode takes full advantage of the proposed design's pipeline efficiency, as the independence between blocks enables optimal use of the four-stage pipeline. Multiple blocks can be processed simultaneously, whether within a single task or across multiple tasks, achieving maximum throughput for both individual and multi-guest operations. ### CBC mode: Cipher Block Chaining mode ![image](https://hackmd.io/_uploads/Sy8lEVfvke.png) In CBC mode, each plaintext block is XORed with the previous ciphertext block before encryption, creating a dependency between blocks that prevents block-wise parallelism. The first block is XORed with an initialization vector (IV), ensuring uniqueness for the first encryption. In this mode, the pipeline stages must process blocks serially due to the dependency on the previous block's result. Despite this limitation, the proposed design effectively utilizes its parallel resources by handling independent tasks from multiple guests simultaneously. For example, while one set of pipeline stages processes CBC encryption for one guest, another set can handle an independent CBC encryption task for a different guest. This capability allows the design to significantly improve hardware utilization, even in scenarios where intra-task parallelism is not achievable. ### Comparison in the Architecture: | Mode | Pipeline Utilization | Latency | Parallel Task Handling | |------|-----------------------|---------|-------------------------| | **ECB** | Fully utilized (independent blocks) | Lower, as blocks are processed in parallel | Can handle multiple tasks and achieve peak throughput | | **CBC** | Limited by block dependencies (serial within task) | Higher due to sequential block processing | Achieves efficiency by serving tasks from different guests | ## Parallelization: Cluster-AES This paper "[A Pipelined AES and SM4 Hardware Implementation for Multi-Tasking Virtualized Environments](https://easychair.org/publications/preprint/BNHg)" proposes a hardware architecture designed for efficient AES encryption in multi-tasking virtualized environments. The architecture employs multiple cryptographic modules, instantiated to handle workloads from numerous guests, with a simplified design that separates encryption and decryption modules for greater configurability and flexibility using Chisel. ![image](https://hackmd.io/_uploads/BktRPzUvyg.png) ![image](https://hackmd.io/_uploads/HymZdMUPkg.png) The AES encryption architecture adopts a four-stage pipeline design: 1. **AR-SB1 Stage**: Handles the AddRoundKey operation and the top linear part of the S-box. 2. **SB2 and SB3 Stages**: Process the non-linear and bottom linear parts of the S-box, respectively. 3. **SR-MC Stage**: Executes ShiftRows and MixColumns operations, with a bypass register for MixColumns in the final encryption round. To minimize critical path delays, pipeline registers are strategically inserted before and after the non-linear transformation in the S-box and around the MixColumns module. This pipeline design supports parallel processing of four tasks simultaneously. The following diagram depicts a timing representation for AES-128 encryption, showing how distinct input blocks are processed through the pipeline. Each round transformation in the proposed design takes four cycles, resulting in 40 cycles for a full AES-128 encryption. With the four-staged pipeline supporting parallel execution of four tasks, the architecture completes four AES-128 encryptions within 43 cycles. ![image](https://hackmd.io/_uploads/BySm_zLDyl.png) ## Experiment I performed the experiments using [cluster-AES](https://github.com/bathtub-01/cluster-AES), and the test results are as follows: ```bash ➜ cluster-AES git:(master) ✗ sbt test [info] welcome to sbt 1.10.7 (Eclipse Adoptium Java 11.0.21) [info] loading project definition from /Users/bujiyao/cluster-AES/project [info] loading settings for project root from build.sbt... [info] set current project to %NAME% (in build file:/Users/bujiyao/cluster-AES/ [info] PolyMulTest: [info] polymul [info] - should pass [info] KeyBankTest: [info] KeyBank [info] - should pass [info] KeyExpansionTest: [info] KeyExpansion [info] - should pass [info] AESEncModuleTest: [info] AESEnc [info] - should pass [info] AESDecModuleTest: [info] AESDec [info] - should pass [info] EncUnitTest: [info] EncUnit [info] - should pass [info] DecUnitTest: [info] DecUnit [info] - should pass [info] GroupTest: [info] Group [info] - should pass [info] Run completed in 10 minutes, 36 seconds. [info] Total number of tests run: 8 [info] Suites: completed 8, aborted 0 [info] Tests: succeeded 8, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 637 s. ``` ## Performance Evaluation The experiments validate the performance improvement of the proposed design in virtualized environments using a prototype system on the Zynq UltraScale+ MPSoC. Key performance metrics, including **average task pending time** and **context switching rate**, were compared between the classic design and the proposed architecture under **CBC** and **ECB** modes. 1. **Context Switching**: - For guest numbers **≤32**, the proposed design reduces context switches by over **50%** compared to the classic design, as it provides **16 independent resources**, four times more than the classic design. - When guest numbers exceed available resources, the advantage diminishes, requiring additional module instances for further improvements. 2. **Task Pending Time**: - In **CBC mode** (no block-wise parallelism), the proposed design reduces average task pending time by **75%**. The classic design struggles in this mode as it cannot utilize pipeline resources efficiently. - In **ECB mode** (block-wise parallelism supported), both designs exhibit similar performance, as independent blocks allow full utilization of resources. ![image](https://hackmd.io/_uploads/H1X7RzIwkl.png) ![image](https://hackmd.io/_uploads/HkzDCzIwJx.png) :::danger Not reproduced experiments. ::: ## Reference - [AES Encryption: Secure Data with Advanced Encryption Standard](https://www.simplilearn.com/tutorials/cryptography-tutorial/aes-encryption) - [The difference in five modes in the AES encryption algorithm](https://www.highgo.ca/2019/08/08/the-difference-in-five-modes-in-the-aes-encryption-algorithm/) - [A Pipelined AES and SM4 Hardware Implementation for Multi-Tasking Virtualized Environments](https://easychair.org/publications/preprint/BNHg) - [Lab3: Construct a single-cycle RISC-V CPU with Chisel](https://hackmd.io/@sysprog/r1mlr3I7p) - [cluster-AES](https://github.com/bathtub-01/cluster-AES)