Solutions
A
Consider an algorithm for transposing a square matrix in-place by swapping rows and columns. Below, you will find the C code for this operation, and it is important to note that the matrix elements are 32-bit integers.
Next, we utilize vector-style instructions to implement the transposition. Below, you will find an abbreviated listing of potentially relevant vector load/store instructions.
Vector Load/Store Instructions
instruction | definition |
---|---|
vsetvli rd, rs, eN, mX, tP, mP |
This instruction updates rd with the vector length computed; rs is an input register operand that contains the application vector length (AVL) which represents the vector length the program wants to use.N in eN is the sew (8, 16, 32, 64, …); X in mX is the lmul (spelled as fY for 1/Y cases); P is the policy for tail (t) and mask (m): u for undisturbed, a for agnostic |
vle32.v vd, (rs1), vm |
vd[i] = mem[(rs1) + i*4] |
vse32.v vs3, (rs1), vm |
mem[(rs1) + i*4] = vs3[i] |
vlse32.v vd, (rs1), rs2, vm |
vd[i] = mem[(rs1) + i*rs2] |
vsse32.v vs3, (rs1), rs2, vm |
mem[(rs1) + i*rs2] = vs3[i] |
vluxei32.v vd, (rs1), vs2, vm |
vd[i] = mem[(rs1) + vs2[i]] (unordered) |
vsuxei32.v vs3, (rs1), vs2, vm |
mem[(rs1) + vs2[i]] = vs3[i] (unordered) |
vloxei32.v vd, (rs1), vs2, vm |
vd[i] = mem[(rs1) + vs2[i]] (ordered) |
vsoxei32.v vs3, (rs1), vs2, vm |
mem[(rs1) + vs2[i]] = vs3[i] (ordered) |
Complete the vector code provided below to vectorize the matrix transpose operation.
- A01 = ?
2- A02 = ?
4- A03 = ?
4- A04 = ?
add t4, t4, t6
- A05 = ?
bnez a0, loop_i
B
Consider REV
as a macro-operation fusion instruction that reverses a linked list in memory. The rs1 operand for this instruction is the memory address of a pointer to the first node in the linked list, which is essentially a pointer to a pointer.
Each node in the linked list has the following structure. In this architecture, assume that pointers are 32 bits wide. The 'next' pointer can either hold the memory address of the next node in the list or be set to 0 (NULL
) to indicate the end of the linked list.
To provide a point of reference, we have included both the C and assembly code equivalents for this instruction below.
Let's analyze the execution of the assembly code for linked list reversal on an unpipelined RISC-V core. In this core, each instruction has a CPI (Cycles Per Instruction) of 1, except for load and store instructions, which require 2 cycles each. How many instructions are needed to reverse a linked list with a length of 4 using this program? (Fill in the numbers)
- B01 = ?
1- B02 = ?
0- B03 = ?
2- B04 = ?
1- B05 = ?
1- B06 = ?
3
A garage door will initiate its opening sequence when it detects the password '011' in a transmission. To put it formally, this Finite State Machine (FSM) accepts a bitstring composed of 0's and 1's as input and produces a continuous stream of 0's until it encounters the substring '011,' at which point it begins generating a continuous stream of 1's. Below are examples of how this FSM operates:
Mark the correct FSM state transition for each numbered arrow.
__ C01 __
__ C02 __
__ C03 __
__ C04 __
- C01 = ?
0/0- C02 = ?
1/1- C03 = ?
1/1- C04 = ?
0/1
Take a look at the circuit diagram below along with the delays of its components:
What is the smallest combinational delay of all paths in this circuit, in picoseconds? __ D01 __
- D01 = ?
25ps
The shortest CL path is between the right register and the top left register, consisting of a NOT gate and an OR gate, for a total delay of 25ps.
What is the maximum hold time the registers can have so that there are no hold time violations in the circuit above?
- D02 = ?
30ps
The shortest CL path is 25ps (see Q6.1), and the maximum hold time is the shortest CL path + clk-to-q, which gives us 30ps.
What is the minimum allowable clock period for this circuit to function properly?
- D03 = ?
117ps
The longest path between any two registers is between the top left register and the register, consisting of a multiplier, a NOT gate, a subtractor, and an AND gate, for a total of 110ps.
Additionally, we need to account for clk-to-q and setup, which gives us 117ps.
Decode the following RISC-V instruction. You can use online RISC-V Instruction Encoder/Decoder.
which is mapped to
E01 should start with x
.
- E01 = ?
x19- E02 = ?
586
Examine the circuit diagram presented below. SEL is a single-bit control signal that undergoes instantaneous updates at the rising edge of each clock cycle, maintaining stability throughout each clock cycle. You can make the assumption that Input will not result in any hold time violations.
The combinational logic block on the left performs a left shift operation on the top input by the number of bits specified by the bottom input. In the diagram, the shifter connected to the register on the left shifts its output to the left by 1 bit.
Suppose we want to maximize the circuit's frequency without altering its behavior, and we are not allowed to modify the delays of any component.
The most significantly slow component in this context is the multiplier, and addressing this issue is necessary. Specifically, if our sole multiplication requirement is doubling numbers, we do not require a versatile multiplier capable of multiplying any two inputs together.
What is the minimum clock period after implementing the above change? __ F01 __
- F01 = ?
77 ps
With the multiplier replaced by a shifter, the new longest path starts from either left-side register, goes through the shifter and the mux, and ends at the rightmost register.
From the rising edge, we wait 30 ps for the output to appear out of the left registers (clk-to-q).
Then we wait another 2 ps for the signal to travel through the shifter, and another 25 ps for the multiplexer. Finally, we have to get to the rightmost register 20 ps before the next rising edge to account for setup time. In total, this is 30 + 2 + 25 + 20 = 77 ps.
The function add_even
is specified as follows:
Input:
Example: Let's consider an input where a0 points to , and a1 holds the value 4
. In this case, the output in a0 will be 10
(4 + 6).
Complete the missing sections in the RISC-V code provided below. Each line should consist of exactly one instruction or pseudo-instruction.
- G01 = ?
slli t2, t2, 1
- G02 = ?
addi a1, a1, -1
A Hamming Error Correcting Code is capable of rectifying single-bit errors. In scenarios where data is susceptible to errors, such as in satellite communication or L1 caches, it's advantageous to employ Hamming codes to store data blocks. These codes can internally correct errors during memory retrieval.
In this context, we will examine a (7,4) Hamming Code with even parity. This code utilizes 7 bits in total to store 4 data bits. Please refer to the following bit pattern for the (7,4) Hamming Code:
We follow the convention where the first bit is considered the most significant in the data, and the seventh bit is the least significant. Then, we create a circuit that, when provided with a (7,4) Hamming Code, produces a corrected nibble of data. What components should be placed in the labeled boxes? Assume that multiple-input gates are created by connecting two-input gates.
It is known that both Box I and Box II belong to this category. Please provide answers accordingly.
- H01 = ?
XOR gate
Cascaded XOR gates can be used to determine the parity of the input bits; XOR returns 1 if the number of input “1” bits is odd, and 0 if the number of input1
bits is even. This matches our definition of parity.
- H02 = ?
Multiplexer (with 0 input on top)
We need to output the corrected data based on the parity bits; this can only be done using a multiplexer.
- H03 = ?
6 ns.
The values of A-H are computed at 1 ns. Then after the 5 ns delay of Box 2, the output changes.- H04 = ?
8 ns.
The inputs change at 0 ns. Then after the 3 ns delay of Box I, the bottom input to Box II changes. Then after the 5 ns delay of Box II, the output changes.
The function verify_password
is defined as follows:
You have access to the following externally defined labels:
password
: This is a pointer to a statically stored string "secretpass."getchars
: This function is defined as follows:
Effect: It reads characters from stdin and populates the buffer pointed to by a0 with the read data, null-terminating the string. Your code can assume that the input consists of a maximum of 19 characters, not including the null-terminator.
In your current role as a hacker, your objective is to target the implementation of verify_password
.
During your testing, you have uncovered an intriguing revelation: the function getchars
does not function as originally intended. Instead of truncating at the 20th character, getchar
continues writing data until it encounters the first null terminator within its input. Just as before, verify_password
resides at memory address 0x1000
, and getchars
is situated at address 0x0F00. Additionally, assume that the stack pointer is initially positioned at 0xBFFF F800 when verify_password
begins execution. Our page size is 4 KiB, and we are currently operating on a little-endian system.
- I01 = ?
5
The buffer is 20 bytes long. Each RISC-V instruction is 4 bytes long. In total, we can fit 20 / 4 = 5 instructions in the buffer.
- I02 = ?
-280.
We allocate 256 bytes of buffer, plus 24 bytes to avoid overlap with the injected instructions and the saved ra register from part 1, for a total of 280 bytes. (This does mean that we rely on data after our stack pointer to stay constant, which is technically undefined behavior. In reality, the data will stay consistent for long enough that
we move the stack pointer back down; since our sp starts in the middle of a page, the data will not be deallocated.- I03 = ?
a0
Thegetchars
function expects the address of a buffer as an argument in the a0 register.- I04 = ?
1
This puts the value 0x1000 in t1, which then causes the next line to jump to 0x1000 - 256 = 0x0F00, the address of Get20Chars. Note that we needed to use lui here, since addi can only increase by up to 0x7FF. Line 4 could also not have been a jal operation, since our jump distance would have exceeded the size of a 20-bit immediate.- I05 = ?
sp
We want to jump to the address in the sp register, and we do not care about saving a return address anywhere.
getchars
; our goal is to have a valid RISC-V jump instruction. Additionally, please take note that +3840 is equivalent to 0x00000F00. Select all possible choices. __ I06 __ret
jalr x0, t1, -256
jalr s0, t1, -256
jalr ra, t2, -256
jalr ra, t0, 16
jalr s0, x0, 3840
- I06 = ?
d, e
ret
=jr ra
=jalr x0 ra 0
andjalr x0, t1 -256
contain a null byte when translated to machine code: 0x00008067 and 0xF0030067.
jalr s0 x0 3840 is an invalid instruction because the immediate 3840 is greater than 2047.
I-type immediates must be between -2048 and +2047.
jalr s0 t1 -256
andjalr ra t0 16
are valid, and do not have a null byte; note that jalr s0, t1, -256 = 0xF0030467 does not have a null byte, because it gets split up into bytes as 0x67 0x04 0x03 0xF0.
Examine the provided RISC-V code snippet:
bltu
instruction?
- J01 = ?
-8
Two instructions away =-8
bytes
Each instruction is byte, so to go back to Loop, it needs to minus bytes.
jalr s0, s1, MAX_POS_IMM
, where MAX_POS_IMM
represents the maximum possible positive immediate value for jalr
. After this instruction executes, what will be the values in the following registers? (Provide your answers in hexadecimal.)
s0
= J02s1
= J03PC
= J04
- J02 = ?
0xA000 0004- J03 = ?
0x4000 0000- J04 = ?
0x4000 0FFF
We know that rd and rs1 fields are now 6 bits.jalr
is an I-type instruction, so we take out thefunct3
bits but we give each of rd and rs1 fields 1 bit, meaning we have 1 bit leftover to give to the immediate field. Thus, we now have a 13-bit immediate. Thus, the maximum possible immediate ajalr
instruction can hold is halfwords away, which is represented as 0b0 1111 1111 1111, which is0x0FFF
.
s0
is the linking register – itss value is PC + 4
s1
does not get written into so it stays the same.
PC = R[s1] + 0x0FFF
K
Let's examine the program that recursively calculates the Fibonacci sequence. On the left, you will find the C code, and on the right, you can see its corresponding translation into RISC-V assembly. It's important to note that the execution has been paused just before executing the 'ret' instruction. The 'SP' label on the stack frame (part 3) indicates the location where the stack pointer is pointing when the execution was halted.
Complete the missing portion of the ble
instruction to make the assembly implementation match the C code.
- K01 = ?
a0, a7, done
a0 is n in C code.
a7 is1
in C code.
done is ready to return in C code.
How many distinct words will be allocated and pushed into the stack each time the function fib
is called?
- K02 = ?
3
Please fill in the values for the blank locations in the stack trace below. Please express the values in HEX.
Notation | address |
---|---|
Smaller address | 0x280 |
0x1 | |
K03 | |
SP | K04 |
K05 | |
0x0 | |
0x280 | |
0x3 | |
0x0 | |
0x2108 | |
0x4 | |
0x6 | |
Larger address | 0x1 |
- K03 = ?
0x0- K04 = ?
0x280- K05 = ?
0x2
Initialn
is 0x3, and the after the inst.L2: addi a0, a0, -1
the k05 becomes 0x3-1=0x2.
K04 = ra = 0x280
s0 is to store the temporary value to addfib(n - 1) + fib(n - 2)
together equal to instadd a0, s0, t0
.
What is the hex address of the done
label? (Answer in HEX)
- K06 = ?
0x298
There have 6 instructions between the firstcall fib
and done label. Therefore the address is 0x280+6*4=0x298
What was the address of the original function call to fib
? (Answer in HEX)
- K07 = ?
0x2104
Since call would store the next instruction's address so the first ra minus 4 is the answer, which is 0x2108.