Try   HackMD

AES Implementation in Chisel

陳侯華

Setting Up the macOS Environment

1. Install Essential Tools

First, install Verilator using Homebrew:

brew install verilator gtkwave

2. Install sbt (Scala Build Tool)

sbt is a build tool for Scala. Before installation, ensure your system has a working JVM (Java Virtual Machine) installed.

  1. Clean up existing installations
    First, remove any old versions of sbt and jenv (if previously installed):

    ​​​brew uninstall sbt
    ​​​brew uninstall jenv
    
  2. Install SDKMAN
    SDKMAN is a tool for managing parallel versions of multiple Software Development Kits:

    ​​​curl -s "https://get.sdkman.io" | bash
    ​​​source "$HOME/.sdkman/bin/sdkman-init.sh"
    
  3. Install Java and sbt
    Install Eclipse Temurin JDK 11 and sbt using SDKMAN:

    ​​​sdk install java 11.0.21-tem
    ​​​sdk install sbt
    

3. Fix GTKWave Compatibility Issues

GTKWave has known compatibility issues with macOS 14 (Sonoma). To resolve this, uninstall the current version and install the HEAD version from randomplum's tap:

brew uninstall gtkwave
brew untap randomplum/gtkwave
brew install --HEAD randomplum/gtkwave/gtkwave

This should resolve the compatibility issues and allow GTKWave to run properly on macOS 14.

Principles of AES Encryption

AES (Advanced Encryption Standard) is a symmetric encryption algorithm, meaning the same key is used for both encryption and decryption. It offers different levels of security depending on the key length (128, 192, or 256 bits). Below is an explanation of its core principles and operation process:

Basic Concepts

  1. Symmetric Encryption

    • The same key is used for both encryption and decryption.
    • The security of the algorithm depends on keeping the key secret.
  2. Block Cipher

    • AES processes fixed-size data blocks (128 bits = 16 bytes).
    • If the input data exceeds the block size, padding is applied, or the data is divided into multiple blocks for encryption.
  3. Rounds of Encryption

    • AES performs 10, 12, or 14 rounds of operations depending on the key length, with each round consisting of multiple steps.

Encryption Process

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Each block is processed sequentially following these steps. Once all blocks are successfully encrypted, they are concatenated to form the final ciphertext. The steps are as follows:

Fix permissions of uploaded files.

1. Initial Operation: AddRoundKey

  • AddRoundKey: XORs the plaintext with the first round key.
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

2. Round Operations: 10, 12, or 14 rounds (depending on key length)

Each round consists of the following four steps:

Step 1: SubBytes

  • Each byte in the state matrix is substituted using a fixed S-Box (substitution box).
  • The S-Box is derived from non-linear transformations and finite field inversion, increasing resistance to attacks.
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

Step 2: ShiftRows

  • The rows of the state matrix are shifted as follows:
    • Row 0: No shift.
    • Row 1: Left shift by 1 byte.
    • Row 2: Left shift by 2 bytes.
    • Row 3: Left shift by 3 bytes.
  • This step scrambles the byte positions, enhancing diffusion.
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

Step 3: MixColumns

  • Each column of the state matrix is treated as a polynomial and multiplied with a fixed polynomial.
  • This operation spreads the data's influence across all bytes in the column.
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

Step 4: AddRoundKey

  • The current state matrix is XORed with the corresponding round key.
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

3. Final Round

  • Repeat Steps 1, 2, and 4 (MixColumns is skipped).
  • The result is the ciphertext.

Decryption Process

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

The AES decryption process reverses the encryption steps, performing the following operations:

  1. InvShiftRows
  2. InvSubBytes
  3. AddRoundKey
  4. InvMixColumns

Unlike encryption, the round key usage sequence is reversed during decryption.

Two Modes in the AES

The block ciphers are schemes for encryption or decryption where a block of plaintext is treated as a single block and is used to obtain a block of ciphertext with the same size. The size of an AES block is 128 bits, whereas the size of the encryption key can be 128, 192 or 256 bits. Please note this, there is three length in the key, but the size of the encryption block always is 128 bits. Block cipher algorithms should enable encryption of the plaintext with size which is different from the defined size of one block as well. We can use some algorithms for padding block when the plaintext is not enough a block, like PKCS5 or PKCS7.

ECB mode: Electronic Code Book mode

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

In ECB mode, each plaintext block is encrypted independently, which allows for high parallelism but comes with lower security since identical plaintext blocks result in identical ciphertext blocks. This mode takes full advantage of the proposed design's pipeline efficiency, as the independence between blocks enables optimal use of the four-stage pipeline. Multiple blocks can be processed simultaneously, whether within a single task or across multiple tasks, achieving maximum throughput for both individual and multi-guest operations.

CBC mode: Cipher Block Chaining mode

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

In CBC mode, each plaintext block is XORed with the previous ciphertext block before encryption, creating a dependency between blocks that prevents block-wise parallelism. The first block is XORed with an initialization vector (IV), ensuring uniqueness for the first encryption. In this mode, the pipeline stages must process blocks serially due to the dependency on the previous block's result. Despite this limitation, the proposed design effectively utilizes its parallel resources by handling independent tasks from multiple guests simultaneously. For example, while one set of pipeline stages processes CBC encryption for one guest, another set can handle an independent CBC encryption task for a different guest. This capability allows the design to significantly improve hardware utilization, even in scenarios where intra-task parallelism is not achievable.

Comparison in the Architecture:

Mode Pipeline Utilization Latency Parallel Task Handling
ECB Fully utilized (independent blocks) Lower, as blocks are processed in parallel Can handle multiple tasks and achieve peak throughput
CBC Limited by block dependencies (serial within task) Higher due to sequential block processing Achieves efficiency by serving tasks from different guests

Parallelization: Cluster-AES

This paper "A Pipelined AES and SM4 Hardware Implementation for Multi-Tasking Virtualized Environments" proposes a hardware architecture designed for efficient AES encryption in multi-tasking virtualized environments. The architecture employs multiple cryptographic modules, instantiated to handle workloads from numerous guests, with a simplified design that separates encryption and decryption modules for greater configurability and flexibility using Chisel.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

The AES encryption architecture adopts a four-stage pipeline design:

  1. AR-SB1 Stage: Handles the AddRoundKey operation and the top linear part of the S-box.
  2. SB2 and SB3 Stages: Process the non-linear and bottom linear parts of the S-box, respectively.
  3. SR-MC Stage: Executes ShiftRows and MixColumns operations, with a bypass register for MixColumns in the final encryption round.

To minimize critical path delays, pipeline registers are strategically inserted before and after the non-linear transformation in the S-box and around the MixColumns module. This pipeline design supports parallel processing of four tasks simultaneously. The following diagram depicts a timing representation for AES-128 encryption, showing how distinct input blocks are processed through the pipeline. Each round transformation in the proposed design takes four cycles, resulting in 40 cycles for a full AES-128 encryption. With the four-staged pipeline supporting parallel execution of four tasks, the architecture completes four AES-128 encryptions within 43 cycles.

image

Experiment

I performed the experiments using cluster-AES, and the test results are as follows:

➜  cluster-AES git:(master) ✗ sbt test
[info] welcome to sbt 1.10.7 (Eclipse Adoptium Java 11.0.21)
[info] loading project definition from /Users/bujiyao/cluster-AES/project
[info] loading settings for project root from build.sbt...
[info] set current project to %NAME% (in build file:/Users/bujiyao/cluster-AES/
[info] PolyMulTest:
[info] polymul
[info] - should pass
[info] KeyBankTest:
[info] KeyBank
[info] - should pass
[info] KeyExpansionTest:
[info] KeyExpansion
[info] - should pass
[info] AESEncModuleTest:
[info] AESEnc
[info] - should pass
[info] AESDecModuleTest:
[info] AESDec
[info] - should pass
[info] EncUnitTest:
[info] EncUnit
[info] - should pass
[info] DecUnitTest:
[info] DecUnit
[info] - should pass
[info] GroupTest:
[info] Group
[info] - should pass
[info] Run completed in 10 minutes, 36 seconds.
[info] Total number of tests run: 8
[info] Suites: completed 8, aborted 0
[info] Tests: succeeded 8, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 637 s.

Performance Evaluation

The experiments validate the performance improvement of the proposed design in virtualized environments using a prototype system on the Zynq UltraScale+ MPSoC. Key performance metrics, including average task pending time and context switching rate, were compared between the classic design and the proposed architecture under CBC and ECB modes.

  1. Context Switching:

    • For guest numbers ≤32, the proposed design reduces context switches by over 50% compared to the classic design, as it provides 16 independent resources, four times more than the classic design.
    • When guest numbers exceed available resources, the advantage diminishes, requiring additional module instances for further improvements.
  2. Task Pending Time:

    • In CBC mode (no block-wise parallelism), the proposed design reduces average task pending time by 75%. The classic design struggles in this mode as it cannot utilize pipeline resources efficiently.
    • In ECB mode (block-wise parallelism supported), both designs exhibit similar performance, as independent blocks allow full utilization of resources.

image
image

Not reproduced experiments.

Reference