蕭郁霖, 徐向廷
Filepath: src/main/scala/Pipeline/UNits/RegisterFile.scala
The code snippet defines a RegisterFile
module for a RISC-V pipeline, featuring seven input and output ports dedicated to data transfer. In RISC-V, unlike the classic MIPS pipeline, the register file supports two read registers (rs1
and rs2
) and a single write register. Initially, the register file is instantiated with 32 registers, all initialized to 0. The outputs rdata1
and rdata2
are continuously updated based on the values of rs1
and rs2
, respectively—with a special check to ensure that reading from register 0 always returns 0. For write operations, if the reg_write
flag is asserted and the target register (w_reg
) is not zero, the corresponding register is updated with the value provided on w_data
. The following image illustrates the seven ports that facilitate these operations in the RegisterFile
unit.
Filepath: src/main/scala/Pipeline/UNits/PC.scala
The code snippet implements a program counter (PC
) module that maintains the current program counter value. It uses RegInit
to initialize the register to 0 and updates the stored PC value with the input (io.in
) at every cycle, while also exposing this value via io.out
.
Filepath: src/main/scala/Pipeline/UNits/PC4.scala
The second snippet defines a PC4
module, which computes the next program counter value by simply adding 4 to the current PC input (io.pc
). This incrementation is crucial for sequential instruction execution in the pipeline.
Filepath: src/main/scala/Pipeline/UNits/JALR.scala
The code snippet above implements the address calculation for the jump-and-link-register (JALR
) instruction. The module computes the target address by adding a forwarded register value (rdata1
) to an immediate offset (imme
). To ensure proper alignment, it then applies a binary mask (0xFFFFFFFE
), forcing the least significant bit (LSB) to 0. The aligned jump address is finally provided through io.out
.
Filepath: src/main/scala/Pipeline/UNits/ImmGenerator.scala
The code snippet implements the generation of 32-bit immediate values from RISC-V instructions, tailored to each instruction format. For I-type instructions, it extracts bits [31:20]
from the instruction and sign-extends them to 32 bits. In the case of S-type instructions, the immediate is formed by concatenating bits [31:25]
with bits [11:7]
and then sign-extending the result. For branch (SB-type) instructions, the immediate is built by concatenating several segments—bit 31, bit 7, bits [30:25]
, and bits [11:8]
—with an additional 0 appended as the least significant bit for proper alignment, followed by sign extension. For U-type instructions, the immediate is taken from bits [31:12]
and shifted left by 12 bits. Finally, for UJ-type instructions, the immediate is generated by concatenating bit 31, bits [19:12]
, bit 20, and bits [30:21]
, appending a trailing 0, and then sign-extending the result to 32 bits.
Additionally, the module computes target addresses for control flow instructions using these immediates. The output io.SB_type
represents the branch target address for SB-type instructions, obtained by adding the sign-extended branch immediate to the current program counter (PC), thus yielding a PC-relative address for branch operations. Similarly, io.UJ_type
provides the target address for UJ-type (jump) instructions by adding the corresponding immediate value to the current PC. These computed addresses are essential for correctly directing the control flow during instruction execution in the RISC-V pipeline.
Filepath: src/main/scala/Pipeline/UNits/control.scala
The code snippet above implements the control unit for a 5-stage RISC-V pipeline. This module generates a suite of control signals—such as memory write
, branch
, memory read
, register write
, memory-to-register
, ALU operation
, operand selection
, extension type
, and next PC selection
—that steer the processor’s datapath. Using a switch-case construct keyed on the opcode, the module assigns specific values to these signals according to the instruction type (e.g., R-type, I-type, S-type, SB-type, U-type, UJ-type, etc.). The accompanying diagram and mapping table illustrate how these signals are routed to the appropriate hardware components in the pipeline.
Label | Signal Name (Code) | Signal Name (Diagram) |
---|---|---|
1 | io.mem_write | MemWrite |
2 | io.branch | Branch |
3 | io.mem_read | MemRead |
4 | io.reg_write | RegWrite |
5 | io.men_to_reg | MemtoReg |
6 | io.alu_operation | ALUSrc |
7 | io.operand_a | ALUOp1 |
8 | io.operand_b | ALUOp0 |
Filepath: src/main/scala/Pipeline/UNits/BRANCH.scala
The code snippet implements branch decision logic for RISC-V's conditional branch instructions—namely, beq
, bne
, blt
, bge
, bltu
, and bgeu
. It uses four input ports: io.fnct3
, which indicates the specific branch condition based on the instruction's function field; io.branch
, a Boolean flag identifying whether the current instruction is an SB-Type branch; and io.arg_x
and io.arg_y
, which are the operands to be compared. Based on the value of fnct3
, the module evaluates the appropriate comparison between arg_x
and arg_y
, and if the condition is satisfied, sets the output io.br_taken
to true, indicating that a branch should be taken.
Filepath: src/main/scala/Pipeline/UNits/Alu_Control.scala
The code snippet above implements the ALU Control Unit for a RISC-V pipeline, as illustrated in the diagram below. This unit features three input ports—func3
, func7
, and aluOp
(a signal provided by the core control unit)—and one output port, io.out
. The 5-bit output is determined by combining values from these inputs in a way that depends on the instruction type. For example, R-type instructions derive the ALU operation by concatenating specific bits from func7
and func3
, while I-type instructions form the control signal by prepending a fixed two-bit value to func3
. Other instruction types—such as branch (SB type), jump, and load/store operations—are assigned specific constant values to control the ALU accordingly.
Filepath: src/main/scala/Pipeline/UNits/Alu.scala
The code snippet implements the ALU unit for a RISC-V pipeline, responsible for executing various arithmetic and logical operations based on the instruction type. The module accepts three input ports: two operands (io.in_A
and io.in_B
) and an operation code (io.alu_Op
) coming from the ALU Control Unit. The result of the computation is output via io.out
. For example, when io.alu_Op
is set to ALU_ADD
or ALU_ADDI
(among other similar opcodes for load/store and immediate operations), the module computes the sum of io.in_A
and io.in_B
and assigns the result to io.out
.
Since the RISC-V pipeline consists of five stages, it requires four sets of pipeline registers. These registers are encapsulated in modules labeled IF/ID
, ID/EX
, EX/MEM
, and MEM/WB
, where the slash indicates the two adjacent stages that the register bridges. These pipeline registers are painted orange in the illustration below.
Filepath: src/main/scala/Pipeline/Pipelines/IF_ID.scala
Although the illustration above shows only three register ports at IF/ID
, the design also takes into account hazard detection (which will be discussed later). In this context, the SelectedPC
signal represents the program counter after hazard resolution. Consequently, the IF/ID
pipeline register stores four values: io.pc_in
, io.pc4_in
, io.SelectedPC
, and io.SelectedInstr
. These registers are instantiated using the RegInit
class, which initializes them with default values.
Filepath: src/main/scala/Pipeline/Pipelines/ID_EX.scala
The code snippet implements the ID/EX
pipeline register, which captures and stores several critical values for the subsequent execution stage. In particular, it holds the operand data (rs1_data
and rs2_data
), the incremented program counter (IFID_pc4
), and the immediate value (imm
).
Additionally, it preserves nine control signals generated during instruction decode, ensuring proper propagation through the multi-stage pipeline. Register addresses and function fields such as rs1
, rs2
, rd
, func3
, and func7
are also stored to support data forwarding in the event of hazards.
RegNext
is used instead of RegInit
because it automatically captures and updates each value at the next clock cycle, maintaining seamless data flow between pipeline stages without the need for an explicit initial value.
Filepath: src/main/scala/Pipeline/Pipelines/EX_MEM.scala
The code snippet above implements the EX/MEM
pipeline registers, which transfer critical data and control signals from the execution stage (EX
) to the memory stage (MEM
). In this module, essential control signals—namely, memRD
, memWr
, and memToReg
—are preserved to ensure proper memory operations and data routing. Additionally, the ALU result (alu_out
) is stored along with the reg_w_out
and rd_out
signals, which are vital for hazard detection and data forwarding in later pipeline stages.
Filepath: src/main/scala/Pipeline/Pipelines/MEM_WB.scala
The code snippet above implements the MEM/WB
pipeline registers, which transfer essential data from the memory stage (MEM
) to the write-back stage (WB
). Specifically, this module preserves control signals such as memToReg
, reg_w
, and memRd
, as well as key data values including the destination register (rd
), data from memory (dataMem
), and the ALU output (alu
).
In the RISC-V pipeline, two distinct memory units are employed: instruction memory and data memory. The repository implements these as separate modules, each tailored to its specific role in the processor's operation.
Filepath: src/main/scala/Pipeline/Memory/InstMem.scala
The code snippet implements the instruction memory module for the RISC-V pipeline. This module features one 32-bit address input (io.addr
) used to fetch instructions and one 32-bit data output (io.data
) for delivering the corresponding instruction. The memory is instantiated with Mem(1024, UInt(32.W))
, which creates an array of 1024 entries, each capable of storing a 32-bit instruction. The initFile
parameter specifies the file from which the initial contents of the instruction memory are loaded, and the function loadMemoryFromFile
is used to populate the memory with these values. Finally, the module accesses the instruction memory by dividing the input address by 4 to ensure proper word alignment.
Filepath: src/main/scala/Pipeline/Memory/DataMemory.scala
The code snippet implements the data memory unit for the RISC-V pipeline. This module features four input ports—io.addr
, io.dataIn
, io.mem_read
, and io.mem_write
—and one output port, io.dataOut
. It instantiates a memory array with 1024 entries, where each entry is a 32-bit word. When the control signal io.mem_write
is asserted, the module writes the data from io.dataIn
into the memory at the address specified by io.addr
. Conversely, if io.mem_read
is activated, the module reads the data stored at io.addr
and outputs it via io.dataOut
.
Filepath: src/main/scala/Pipeline/Hazard Units/StructuralHazard.scala
The code snippet implements the structural hazard resolution mechanism for the RISC-V pipeline. This module is connected to four input ports—rs1
, rs2
, MEM_WB_regWr
, and MEM_WB_Rd
—and produces two output ports—fwd_rs1
and fwd_rs2
. The module checks whether the register destination in the MEM/WB
stage (MEM_WB_Rd
) matches either source register (rs1
or rs2
) while ensuring that write-back is enabled (i.e., MEM_WB_regWr
is asserted). If a match is detected, the corresponding forwarding signal (fwd_rs1
or fwd_rs2
) is set to true
; otherwise, it remains false
.
Filepath: src/main/scala/Pipeline/Hazard Units/HazardDetection.scala
The code snippet implements the hazard detection mechanism, which monitors potential data hazards in the pipeline. When the ID/EX
stage is performing a memory read (i.e., io.ID_EX_memRead
is true
) and the destination register (io.ID_EX_rd
) matches either of the source registers specified in the instruction (Rs1
or Rs2
extracted from io.IF_ID_inst
), the module asserts three forwarding signals: inst_forward
, pc_forward
, and ctrl_forward
are all set to true. These signals indicate that instruction, program counter, and control signal forwarding are required to avoid pipeline stalls. Otherwise, all forwarding signals remain false
. Additionally, the module passes through the values of io.IF_ID_inst
, io.pc_in
, and io.current_pc
to io.inst_out
, io.pc_out
, and io.current_pc_out
, respectively, ensuring that the instruction and relevant PC values continue to the next pipeline stage.
Filepath: src/main/scala/Pipeline/Hazard Units/Forwarding.scala
This module implements the forwarding unit, which dynamically selects and routes data from later pipeline stages to resolve data hazards in the RISC-V pipeline. The unit examines the source registers from the ID/EX
stage (i.e., IDEX_rs1
and IDEX_rs2
) and compares them with the destination registers from both the EX/MEM
and MEM/WB
stages. Depending on which stage provides the most recent data, the module assigns a corresponding two-bit value to the forwarding outputs (forward_a
and forward_b
). For example, when the EX/MEM
stage is writing to a non-zero register that matches a source operand, the corresponding forward signal is set to binary 10
, indicating that data should be forwarded directly from the EX/MEM
stage.
In the MEM
hazard section, the module addresses cases where the MEM/WB
stage holds the data needed by the current instruction. Here, the module checks whether the MEM/WB
stage is writing to a non-zero register that matches the source registers of the ID/EX
stage. However, this forwarding is only enabled if the EX/MEM
stage is not already forwarding for that register (thereby prioritizing EX
hazards). If the conditions are met, the forward signal is set to binary 01
, signaling that the required data should be forwarded from the MEM/WB
stage. This mechanism ensures that even if an instruction's result has not been written back yet, the correct value is available for subsequent computations, thereby avoiding pipeline stalls.
Filepath: src/main/scala/Pipeline/Hazard Units/BranchForward.scala
The BranchForward
module is a key component in the RISC-V pipeline, responsible for resolving data hazards during branch
and Jalr
instruction execution. It determines if source operands for branch evaluation need to be forwarded from later pipeline stages to avoid stalls. The module takes as inputs the destination register identifiers and memory read flags from the ID/EX
, EX/MEM
, and MEM/WB
pipeline stages, alongside the source register identifiers (rs1
and rs2
) of the branch instruction and a control signal (ctrl_branch
). The outputs, forward_rs1
and forward_rs2
, are four-bit signals indicating the source of the forwarded data. When ctrl_branch
is set to 1
, branch forwarding logic is applied by sequentially checking for hazards in the ID/EX
, EX/MEM
, and MEM/WB
stages, forwarding the most recent valid data to the source registers based on specific matching conditions.
For Jalr
instructions, indicated when ctrl_branch
is set to 0
, the module only evaluates the source register rs1
for potential forwarding. It similarly checks the ID/EX
, EX/MEM
, and MEM/WB
stages for data matches, prioritizing the most recent and valid data for forwarding. Different codes are assigned to forward_rs1
based on whether the data comes from a memory read or a non-memory read operation. This modular and hierarchical approach ensures that the correct operand is always forwarded for branch or Jalr
instruction evaluation, reducing pipeline stalls and maintaining efficient instruction execution.
Filepath: src/main/scala/Pipeline/Main.scala
This code snippet demonstrates the use of MuxLookup
to manage the Program Counter (PC) update logic in a pipeline processor. It incorporates hazard detection mechanisms to ensure the correct instruction is executed, even in the presence of potential pipeline hazards.
This code is responsible for decoding the fetched instruction by extracting its opcode to identify the instruction type. Based on the opcode and the instruction format, it determines the values of the rs1
and rs2
register fields, specifying the source registers to be used for operations. The rs1
field is selected for instruction types such as R-type, I-type, S-type, SB-type, and JALR, while the rs2
field is used for R-type, S-type, and SB-type instructions. Additionally, the reg_write
signal is configured to enable or disable write-back to the register file (RegFile
), depending on whether the current instruction requires a write operation. This ensures the proper setup of source registers and write-back control for subsequent execution stages.
Instruction | Opcode | Decimal |
---|---|---|
R-type | 011 0011 | 51 |
I-type | 001 0011 | 19 |
S-type | 010 0011 | 35 |
I-type (load instructions) | 000 0011 | 3 |
SB-type (branch) | 110 0011 | 99 |
JALR instruction | 110 0111 | 103 |
This code implements data forwarding for the rs1 and rs2 source registers to handle potential data hazards in the pipeline.
S_rs1DataIn
and S_rs2DataIn
: Wires used to hold the correct values for rs1
and rs2
after evaluating forwarding needs.
0.S
if no valid data path is available.This ensures that the pipeline uses the most up-to-date data for execution, maintaining correctness and avoiding unnecessary stalls.
This code snippet implements stalling logic to handle control hazards in a pipelined processor. When a hazard is detected, the pipeline stage is stalled by setting all control signals in the ID_EX
pipeline register to 0
. Otherwise, the normal control signals are passed through.
In addition to constructing a pipelined RISC-V CPU using Chisel, it is essential to verify the integrity of the structure. Therefore, we first verify the correctness of our RISC-V test code using a third-party processor simulator named Ripes. Next, we establish the expected register outputs and compare them with the results produced by our CPU.
However, since the register values are confined within the RegisterFile module, we need to "expose" them through the IO Bundle. The following code snippet shows the modified IO of this module, which exposes all argument registers, temporary registers, and save registers.
After exposing these IO ports, we need to wire the register values to the corresponding output ports. The following code snippet implements the wiring logic within the module.
Similarly, we expose the register values outside the PIPELINE
module using the subsequent code snippets.
Finally, in our MainTest.scala
, we add test cases following the structure shown in the code snippet below:
The code provided in the repository initially could not properly execute our test cases. Consequently, we traced the execution process and monitored register states after each clock cycle. However, since neither Chisel nor the author of the repository offers a user-friendly debugging tool like the Ripes simulator, which displays register values, we had to implement logging using printf
statements. The following 3 code snippets demonstrate logging for temporary, argument, and save registers.
Additionally, to effectively monitor and analyze the IF_ID
and ID_EXE
pipelines and ALU controls, we include the following code snippet for logging supplementary information.
While testing one of our programs, we observed an unusual discrepancy by tracing the logs and comparing the register states with those produced by the Ripes simulator.
In Ripes, at clock cycle 41, register t2
is expected to change to 0x33333000
and then to 0x33333333
due to the instruction li t2, 0x33333333
.
However, examining the ALU log reveals that at clock cycle 39 during the EXE
stage, the CPU adds 0x15555555
and 0x33333000
instead of 0x00000000
and 0x33333000
as expected from the lui t2, 0x33333
instruction. Further analysis shows that the value 0x15555555
is incorrectly forwarded from the write-back pipeline register. This issue originates from the module responsible for hazard detection.
The original implementation included a StructuralHazard
class intended to resolve structural hazards but inadvertently handled data hazards instead, as shown in the code snippet below.
Additionally, its integration in Main.scala
disrupted proper data forwarding by only addressing hazards from the MEM/WB
pipeline and ignoring those from the EX/MEM
pipeline. It also detected hazards at incorrect stages. To rectify this, we removed the flawed StructuralHazard
class and correctly implemented structural hazard resolution.
After removing the defective module, new issues emerged, observable in the logs of argument and temporary registers.
Specifically, at clock cycle 24, the instruction add t0, x0, a0
was supposed to complete execution. However, analysis of the control signal history in the decode and execute stages revealed that there was no forwarding of the latest value of a0
. Consequently, reading and writing occurred simultaneously, causing the CPU to fetch stale data since reading is typically faster than writing.
Furthermore, the forwarding scenarios primarily included EX/MEM
→ ALU
, MEM/WB
→ ALU
, and MEM/WB
→ InstrDecode
. The root cause of the issue was neglecting the priority of writing to registers before reading from them when both operations use the same register. This oversight led to reading stale data. For example, while storing 0x7FFF0000
to a0
, the CPU simultaneously attempted to read a0
, resulting in the stale value 0xFFFFFFFF
.
To ensure that writing to registers is prioritized before reading, we revised a section of the RegisterFile
module
and replaced it with the following code snippet
After resolving these hazards, we encountered an unexpected issue with arithmetic operations. The log files below display the state history of save registers and control signals. At clock cycle 64, the instruction 0x408a5a13
(srai s4, s4, 8
) is loaded and executed at clock cycle 66. By clock cycle 68, the instruction performs a logical right shift without the required sign extension for srai
or sra
instructions. This is evident from the ALUOp
value of 5 at clock cycle 66, which corresponds to SRL
instead of SRA
.
Through debugging and observation, we discovered that some instructions were not implemented correctly. Specifically, in the ALU module, SRA
and SRAI
should be assigned an ALUOp
value of 13 instead of 5.
The original ALU code only supported I-type instructions with operation codes less than 8, as it only considered func3
values from 000
to 111
(0 to 7).
To fix this issue, we referred to the RISC-V instruction set and extended the ALU module to include the missing instructions. The following table illustrates the I-type and R-type instructions.
To accurately calculate the aluOp
code, the ALU Control unit must consider the entire func7
field of I-type and R-type instructions. Originally, func7
was defined as a boolean in ALU Control, which was incorrect. We rectified this by defining func7
as a 7-bit unsigned integer.
Additionally, outside of ALU Control, we ensured that the correct bits for func7
are properly extracted.
The revised ALU Control code snippet below now supports SRA
, SRAI
, and SUB
instructions, which have operation codes greater than 7.
Filepath: src/main/scala/Pipeline/UNits/Alu_Control.scala
To implement the M-extension, we need to modify the AluControl
module to allow the func7
signal to be passed into the module. Below is the updated definition for the AluControl class:
In the R-type instruction logic, we need to add a condition to handle M-extension instructions. Specifically, when func7
equals b0000001
, the instruction corresponds to an M-extension operation, such as multiplication (MUL), division (DIV), or remainder (REM). Below is the updated code for supporting M-extension:
func7 === "b0000001"
.func7
equals b0000001
, which indicates an M-extension instruction.14.U
to 21.U
), which corresponds to the predefined codes in the ALU.Extending the ALU to Support M-extension Instructions
To fully implement the M-extension, we need to modify both the AluOpCode object and the ALU module. Below are the detailed steps with the modifications.
Modifying AluOpCode
to Include M-extension Instruction Types
We add operation codes for the M-extension instructions (MUL, DIV, REM, etc.) in the AluOpCode object. These codes will represent each specific M-extension operation.
alu_Op
value for the ALU.Implementing M-extension Operations in the ALU Module
The ALU module is extended to perform the M-extension operations based on the alu_Op
value provided.
Since branching or jumping occurs during the MEM
stage, we need to flush both the IF/ID
and ID/EX
pipelines with NOP
instructions (addi x0, x0, 0
) and clear all control signals. The corrected code is shown below:
This RISC-V assembly program finds the index of the maximum value in a predefined integer array. It initializes the array with three elements (0, 2, 1
) and iterates through it to compare each element with the current maximum value. The program uses registers to track the current maximum value (t0
), its index (t1
), and the current index (t2
). If a larger value is found, both the maximum value and its index are updated. Once the loop completes, the index of the maximum value is stored in register a0
, and the program exits using a system call. This implementation demonstrates basic array traversal and conditional updates in assembly.
This RISC-V assembly program calculates the number of leading zeros in a 32-bit integer. The program starts by loading a value (0x70000002
) into register a0
and calls the my_clz
function. In my_clz
, the input value is processed using a bitmask (t3
) initialized to 0x80000000
(representing the most significant bit). A loop checks each bit from left to right by performing a bitwise AND operation between the input value and the bitmask. If the current bit is 1, the loop exits; otherwise, the bitmask is right-shifted, and a counter (t1
) is incremented. Once the loop completes, the count of leading zeros is returned in a0
, and the program exits.
This RISC-V assembly program calculates the absolute value of a 32-bit floating-point number. The program begins by loading the value 0xFFFFFFFF
into register a0
, representing the input, and then calls the fabsf
function. Inside fabsf
, a bitmask (0x7FFFFFFF
) is loaded into t0
, which clears the sign bit of the input number when applied using a bitwise AND operation. The result, stored back in a0
, represents the absolute value of the input. Finally, the program exits the function and terminates using a system call.
This RISC-V assembly program converts a 16-bit floating-point number (FP16) to a 32-bit floating-point number (FP32). The main function loads the FP16 value (0xFFFFFFFF
) into register a0
and calls the fp16_to_fp32
function. Within fp16_to_fp32
, the program handles sign extraction, normalization, and exponent adjustment. The my_clz
function is used to calculate the number of leading zeros for normalization. The program adjusts the FP16 format to FP32 by aligning the mantissa, adding a bias to the exponent, and managing special cases like zeros, infinities, and NaNs. Finally, the result is constructed by combining the sign, exponent, and mantissa and is returned in a0
. The program uses a stack for register saving and restoring during function calls to maintain execution context.
This RISC-V assembly program performs multiplication using the shift-and-add method, which is a bitwise algorithm. It takes two numbers (multiplier and multiplicand) and calculates their product without using the mul
instruction. The program handles negative values by converting them to positive before computation and uses a 32-bit loop counter to iterate through each bit of the multiplier. For each bit, it conditionally adds the multiplicand to an accumulator if the bit is 1. The multiplier is shifted right, and the multiplicand is shifted left after each iteration. The result is stored in a0
at the end, and the program exits.
sbt test
To save the execution history as a file, use sbt test > <filename.txt>
.
git clone https://github.com/riscv/riscv-gnu-toolchain
cd riscv-gnu-toolchain
sudo apt-get install autoconf automake autotools-dev curl python3 libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev ninja-build
make linux
*.s
to *.elf
)
riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -nostdlib -o <in_name>.elf <out_name>.s
For RISC-V programs utilizing the M-extension, change to -march=rv32im
.
*.elf
to *.bin
)
riscv64-unknown-elf-objcopy -O binary <out_name>.elf <in_name>.bin
*.elf
to *.hex
)
riscv64-unknown-elf-objcopy -O verilog <out_name>.elf <in_name>.hex
The compiled program must undergo post-processing for being encoded in the form of little Endian, containing special characters and whitespaces.