A
Consider a cache in certain RISC-V processor with an AMAT (average memory access time) of 1.5
cycles. Accessing our cache takes 1
cycle, and on a miss, it takes an additional 10
cycles to retrieve the data from main memory and update the cache. What does our hit ratio need to be in order to achieve the target AMAT?
Hit ratio = __ A01 __
- A01 = ?
The timing for a particular cache is as follows: checking the cache takes 1 cycle. If there is a hit the data is returned to the CPU at the end of the first cycle. If there is a miss, it takes 10
additional cycles to retrieve the word from main memory, store it in the cache, and return it to the CPU. If we want an average memory access time of 1.4
cycles, what is the minimum possible value for the cache’s hit ratio?
Minimum possible value of hit ratio: __ A02 __
- A02 = ?
B
Consider that P1
is a five-stage pipelined RISC-V processor (IF, DEC, EXE, MEM, WB), where:
Assume that the loop shown to the right has been running for a while and will be re-executed after this iteration.
Given the program below. How many cycles does it take to execute one iteration of the loop on this processor?
Cycles per iteration on this processor P1
: __ B01 __
B01 = ?
Now consider that P2
is a variant of the previous five-stage pipelined RISC-V processor (IF, DEC, EXE, MEM, WB), where:
Assume that our example loop (same as the previous program) has been running for a while and will be re-executed after this iteration. How many cycles does it take to execute one iteration of the loop on this processor?
Cycles per iteration on processor P2
: __ B02 __
- B02 = ?
C
You are designing a four stage RISC-V processor (IF, DEC, EX, WB) that:
This processor has a tight area constraint, and you only have enough area for one bypass path for read-after-write hazards. You are trying to decide whether to put the bypass path from EX to DEC or from WB to DEC. As part of this evaluation, you construct two processors:
P1
: Bypassing from WB to DEC.P2
: Bypassing from EX to DEC.These processors ensure that if the instruction in the DEC stage does not have all of its operands ready to read from the register file or read from a bypass path, the instruction in the decode stage is stalled.
You are using the following loop of a program to evaluate the performance of the processor:
Assume this loop has been running for a long time, and that the processor assumes that the branch will be taken.
How many cycles does this loop take to execute in each of the processors?
P1
cycles per iteration: __ C01 __P2
cycles per iteration: __ C02 __
- C01 = ?
- C02 = ?
D
Consider the execution of the following code sequence on a 5-stage pipelined RISC-V processor, which is fully bypassed. The code below iterates over the array shown below to find the index of the element whose value is 0x104. Thus, on the second iteration of the loop, it branches to label done. The instruction unimp signals the end of the program. This processor:
The program:
Are any instructions stalled in the decode (DEC) stage during execution of this loop? If so, list the opcode(s) here. If not, write N/A
.
Instructions stalled in DEC stage: __ D01 __
- D01 = ?
How many NOPs, if any, are inserted into the pipeline to handle the stalls required for execution of one iteration of the loop?
Number of NOPs added to handle stalls: __ D02 __
- D02 = ?
How many cycles does one iteration of the loop (first iteration, i.e. lw -> beq -> addi -> j -> sub -> srli) take?
Number of cycles for one iteration of the loop: __ D03 __
- D03 = ?
How many NOPs, if any, are inserted into the pipeline to handle the final iteration of the loop through the end of the program?
Number of NOPs added to handle stalls: __ D04 __
- D04 = ?
Which additional instructions, if any, would need to be stalled if the processor did not have a bypass path from the EXE stage but still had one from the MEM and WB stages?
Instructions stalled due to removal of EXE bypass path: __ D05 __
- D05 = ?
How many additional NOPs, if any, are required to handle the removal of the EXE bypass path?
Number of NOPs required to handle removal of EXE bypass path: __ D06 __
- D06 = ?
E
Consider a direct-mapped cache with 64 total data words with 1 word/cache line.
The program shown below repeatedly executes an inner loop that sums the 16 elements of an array that is stored starting in location 0x310
.
The program is executed for many iterations, then a measurement of the cache statistics is made during one iteration through all the code, i.e., starting with the execution of the instruction labeled outer_loop:
until just before the next time that instruction is executed.
In total, how many instruction fetches occur during one complete iteration of the outer loop? How many data reads?
Number of instruction fetches: __ E01 __
Number of data reads: __ E02 __
- E01 = ?
- E02 = ?
The memory address has 32 bits total which will be divided up into bits for the tag, index, block offset and byte offset.
For each memory access, we look at bits [7:2]
of the address to determine which index in the cache that particular memory address maps to.
lw
instructions.Thus, many slots in the cache are already populated. We need to figure out which ones will result in cache misses. As we walk through the program for the first time, we will not count initializing the cache contents as cache misses. How many instruction fetch misses occur during one complete iteration of the outer loop? How many data read misses? Hint: remember that the array starts at address 0x310.
Number of instruction fetch misses: __ E03 __
Number of data read misses: __ E04 __
- E03 = ?
- E04 = ?
What is the hit ratio measured after one complete iteration of the outer loop?
Hit ratio: __ E05 __
- E05 = ?
F
Assume, the program shown below is being run on a RISC-V processor with a cache with the following parameters:
The cache will divide the 32-bit address supplied by the processor into four fields: 2 bits of byte offset, B bits of block offset, L bits of cache line index, and T bits of tag field. Based on the cache parameters given above, what are the appropriate values for B, L, and T?
value for B: __ F01 __
value for L: __ F02 __
value for T: __ F03 __
- F01 = ?
- F02 = ?
- F03 = ?
If the SLLI
instruction is resident in a cache line, what will be its cache line index? the value of the tag field for the cache?
Cache line index for SLLI when resident in cache: __ F04 __
Tag field for SLLI when resident in cache: __ F05 __
- F04 = ?
- F05 = ?
Given that the code begins at address 0x240
and the array begins at address 0x420
, and that there are 16 elements in the array as shown in the code above, list all the values j () where the location holding the value A[j]
will map to the same cache line index as the SLLI
instruction in the program.
List all j
where A[j]
have the same cache line index as SLLI
: __ F06 ___
- F06 = ?
G
Consider a 2-way set-associative cache where each way has 4 cache lines with a block size of 2 words. Each cache line includes a valid bit (V) and a dirty bit (D), which is used to implement a write-back strategy. The replacement policy is least-recently-used (LRU). The cache is used for both instruction fetch and data (LD,ST) accesses. Please use this cache when answering questions (A) through (D).
Using this cache, a particular benchmark program experiences an average memory access time (AMAT) of 1.3 cycles. The access time on a cache hit is 1 cycle; the miss penalty (i.e., additional access time) is 10 cycles. What is the hit ratio when running the benchmark program?
Hit ratio for benchmark program: __ G01 __
- G01 = ?
This cache is used to run the following benchmark program. The code starts at memory address 0
; the array referenced by the code has its first element at memory address 0x200
. First determine the number of memory accesses (both instruction and data) made during each iteration through the loop. Then estimate the steady-state average hit ratio for the program, i.e., the average hit ratio after many iterations through the loop.
Number of memory accesses made during each iteration of the loop: __ G02 __
Estimated steady-state average hit ratio: __ G03 __
- G02 = ?
- G03 = ?
H
Consider the diagram below for a 2-way set associative cache to be used with our RISC-V processor. Each cache line holds a single 32-bit word of data along with its associated tag and valid bit (0
when the cache line is invalid, 1
when the cache line is valid).
The RISC-V produces 32-bit byte addresses, A[31:0]. To ensure the best cache performance, which address bits should be used for the cache index? For the tag field?
address bits used for cache index: A[ __ H01 __]
address bits used for tag field: A[ __ H02 __]
- H01 = ?
- H02 = ?
Suppose the processor does a read of location 0x5678
. Identify which cache location(s) would be checked to see if that location is in the cache. For each location specify the cache section (P
or Q
) and row number (0 through 7). E.g., 3P
for row 3
, section P
. If there is a cache hit on this access what would be the contents of the tag data for the cache line that holds the data for this location?
cache location(s) checked on access to 0x5678
: __ H03 __
cache tag data on hit for location 0x5678
(hex): __ H04 __
- H03 = ?
- H04 = ?
Estimate the approximate cache hit ratio for the following program. Assume the cache is empty before execution begins (all the valid bits are 0) and that an LRU replacement strategy is used. Remember the cache is used for both instruction and data (LD) accesses.
approximate hit ratio: __ H05 __
- H05 = ?