陳侯華
First, install Verilator using Homebrew:
sbt
is a build tool for Scala. Before installation, ensure your system has a working JVM (Java Virtual Machine) installed.
Clean up existing installations
First, remove any old versions of sbt and jenv (if previously installed):
Install SDKMAN
SDKMAN is a tool for managing parallel versions of multiple Software Development Kits:
Install Java and sbt
Install Eclipse Temurin JDK 11 and sbt using SDKMAN:
GTKWave has known compatibility issues with macOS 14 (Sonoma). To resolve this, uninstall the current version and install the HEAD version from randomplum's tap:
This should resolve the compatibility issues and allow GTKWave to run properly on macOS 14.
AES (Advanced Encryption Standard) is a symmetric encryption algorithm, meaning the same key is used for both encryption and decryption. It offers different levels of security depending on the key length (128, 192, or 256 bits). Below is an explanation of its core principles and operation process:
Symmetric Encryption
Block Cipher
Rounds of Encryption
Each block is processed sequentially following these steps. Once all blocks are successfully encrypted, they are concatenated to form the final ciphertext. The steps are as follows:
Fix permissions of uploaded files.
1. Initial Operation: AddRoundKey
2. Round Operations: 10, 12, or 14 rounds (depending on key length)
Each round consists of the following four steps:
Step 1: SubBytes
Step 2: ShiftRows
Step 3: MixColumns
Step 4: AddRoundKey
3. Final Round
The AES decryption process reverses the encryption steps, performing the following operations:
Unlike encryption, the round key usage sequence is reversed during decryption.
The block ciphers are schemes for encryption or decryption where a block of plaintext is treated as a single block and is used to obtain a block of ciphertext with the same size. The size of an AES block is 128 bits, whereas the size of the encryption key can be 128, 192 or 256 bits. Please note this, there is three length in the key, but the size of the encryption block always is 128 bits. Block cipher algorithms should enable encryption of the plaintext with size which is different from the defined size of one block as well. We can use some algorithms for padding block when the plaintext is not enough a block, like PKCS5 or PKCS7.
In ECB mode, each plaintext block is encrypted independently, which allows for high parallelism but comes with lower security since identical plaintext blocks result in identical ciphertext blocks. This mode takes full advantage of the proposed design's pipeline efficiency, as the independence between blocks enables optimal use of the four-stage pipeline. Multiple blocks can be processed simultaneously, whether within a single task or across multiple tasks, achieving maximum throughput for both individual and multi-guest operations.
In CBC mode, each plaintext block is XORed with the previous ciphertext block before encryption, creating a dependency between blocks that prevents block-wise parallelism. The first block is XORed with an initialization vector (IV), ensuring uniqueness for the first encryption. In this mode, the pipeline stages must process blocks serially due to the dependency on the previous block's result. Despite this limitation, the proposed design effectively utilizes its parallel resources by handling independent tasks from multiple guests simultaneously. For example, while one set of pipeline stages processes CBC encryption for one guest, another set can handle an independent CBC encryption task for a different guest. This capability allows the design to significantly improve hardware utilization, even in scenarios where intra-task parallelism is not achievable.
Mode | Pipeline Utilization | Latency | Parallel Task Handling |
---|---|---|---|
ECB | Fully utilized (independent blocks) | Lower, as blocks are processed in parallel | Can handle multiple tasks and achieve peak throughput |
CBC | Limited by block dependencies (serial within task) | Higher due to sequential block processing | Achieves efficiency by serving tasks from different guests |
This paper "A Pipelined AES and SM4 Hardware Implementation for Multi-Tasking Virtualized Environments" proposes a hardware architecture designed for efficient AES encryption in multi-tasking virtualized environments. The architecture employs multiple cryptographic modules, instantiated to handle workloads from numerous guests, with a simplified design that separates encryption and decryption modules for greater configurability and flexibility using Chisel.
The AES encryption architecture adopts a four-stage pipeline design:
To minimize critical path delays, pipeline registers are strategically inserted before and after the non-linear transformation in the S-box and around the MixColumns module. This pipeline design supports parallel processing of four tasks simultaneously. The following diagram depicts a timing representation for AES-128 encryption, showing how distinct input blocks are processed through the pipeline. Each round transformation in the proposed design takes four cycles, resulting in 40 cycles for a full AES-128 encryption. With the four-staged pipeline supporting parallel execution of four tasks, the architecture completes four AES-128 encryptions within 43 cycles.
I performed the experiments using cluster-AES, and the test results are as follows:
The experiments validate the performance improvement of the proposed design in virtualized environments using a prototype system on the Zynq UltraScale+ MPSoC. Key performance metrics, including average task pending time and context switching rate, were compared between the classic design and the proposed architecture under CBC and ECB modes.
Context Switching:
Task Pending Time:
Not reproduced experiments.