Quiz5 of Computer Architecture (2024 Fall)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

General Information

You are allowed to read lecture materials.
- That is, an open book exam.
We are using the honor system during this quiz, and would like you to accept the following:
1. You will not share the quiz with anyone.
2. You will not discuss the material on the quiz with anyone until after solutions are released.
Each answer has 1 point.
You must answer in valid numeric representation and/or English alphabet except your formal name.
Always provide your answer in the shortest form. For example, use ptr->member instead of (*ptr).member.
Assume that each C program already includes the headers <stdint.h>, <stddef.h>, <stdlib.h>, <stdio.h>, and <string.h>.
The C standard referenced in this quiz is C99, officially known as ISO/IEC 9899:2018.
Bonus Points: After the class resumes at 10:30 AM, if you voluntarily participate in the class discussions, please email the instructor afterward, including your responses to the instructor's questions and any follow-up contributions. You will receive an additional 20 points for this quiz.
Message the instructor via Facebook if you found something wrong.
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
09:10 ~ 10:20AM on Dec 10, 2024

Problem `A`

Average Memory Access Time (AMAT) represents the average time required to access memory. The main formula is:

A M A T = Hit Time + (Miss Rate \times Miss Penalty)

In a multi-level cache system, two types of miss rates are considered for each cache level:

Global miss rate: Calculated as the number of accesses that missed at a specific cache level divided by the total number of accesses to the entire cache system.
Local miss rate: Calculated as the number of accesses that missed at a specific cache level divided by the total number of accesses made to that cache level.

In a two-level cache system, suppose there were 100 total accesses to the entire cache system, and the L2 cache experienced 20 misses. If the L1 cache has a local miss rate of 50%, what is the local miss rate of the L2 cache? __ A01 __
A01 = ?

Suppose the system has the following characteristics:

An L1 cache with a hit time of 2 cycles and a local miss rate of 20%.
An L2 cache with a hit time of 15 cycles and a global miss rate of 5%.
Main memory with an access time of 100 cycles.

What is the local miss rate of the L2 cache? __ A02 __
A02 = ?

Suppose we aim to reduce the system's AMAT to 8 cycles or lower by introducing an L3 cache. If the L3 cache has a local miss rate of 30%, what is the maximum allowable hit time for the L3 cache? __ A03 __
cycles.
A03 = ?

Given the following block of C code, determine the cache hit rate on a byte-addressed system with a total memory size of 1 MiB. The system has a 16 KiB direct-mapped cache with 1 KiB blocks. Assume the cache starts cold.











#define N_INTS 8192
int A[N_INTS]; /* A lives at 0x10000 */
int i, total = 0;

/* Initialize array */
for (i = 0; i < N_INTS; i += 128)
    A[i] = i;

/* Process array */
for (i = 0; i < N_INTS; i += 128)
    total += A[i];

What is the Tag:Index:Offset (TIO) breakdown? Tag: __ A04 __ bits; Index: __ A05 __ bits; Offset: __ A06 __ bits
A04 = ? A05 = ? A06 = ?

Determine the cache hit rate for the line labeled as Line 7. __ A07 __
A07 = ?

Determine the cache hit rate for the line labeled as Line 11. __ A08 __
A08 = ?

Associativity refers to the number of cache slots a memory block can map to. A fully associative cache has the highest associativity, meaning any block can be placed in any cache slot. In contrast, a direct-mapped cache has the lowest associativity, being only 1-way set associative, where each block has only one specific slot.

For an

N

-way set-associative cache, the following properties hold:

N \times # of sets = # of blocks

Index bits = \log_{2} (# of sets)

Consider a system with 2-way Set-Associative Cache

Address space: 8 bits
Block size: 8 bytes
Cache size: 32 bytes

Classify the following memory accesses as either a cache hit (H), cache miss (M), or cache miss with replacement (R). If a miss occurs, specify its type (Compulsory, Capacity, or Conflict). Assume the Least Recently Used (LRU) replacement policy. Note that while LRU is used here, other policies may apply in different scenarios.

Address (Binary)	Tag / Index / Offset	Hit, Miss, Replace
0b0000 0100	_	_
0b0000 0101	_	_
0b0110 1000	_	_
0b1100 1000	Tag: __ A09 __ / Index: __ A10 __ / Offset: __ A11 __	__ A12 __
0b0110 1000	_	_
0b1101 1101	Tag: __ A13 __ / Index: __ A14 __ / Offset: __ A15 __	__ A16 __
0b0100 0101	_	_
0b0000 0100	_	_
0b0011 0000	Tag: __ A17 __ / Index: __ A18 __ /Offset: __ A19 __	__ A20 __
0b1100 1011	_	_
0b0100 0010	_	_

You should fill in the "Tag / Index / Offset" and "Hit, Miss, Replace" columns based on the cache configuration and memory access behavior.
A09 = ? A10 = ? … A20 = ?

What is the cache hit rate for the accesses listed above? __ A21 __
A21 = ?

Problem `B`

Considering the RISC-V code provided and assuming a pipelined CPU without forwarding, determine the number and types of hazards present. Evaluate all potential hazards between the instructions.

Here is the requested table in Markdown format:

Instruction	C1	C2	C3	C4	C5	C6	C7	C8	C9
1. `sub t1, s0, s1`	IF	ID	EX	MEM	WB
2. `or s0, t0, t1`		IF	ID	EX	MEM	WB
3. `sw s1, 100(s0)`			IF	ID	EX	MEM	WB
4. `bgeu s0, s2, loop`				IF	ID	EX	MEM	WB
5. `add t2, x0, x0`					IF	ID	EX	MEM	WB

How many stalls are required to resolve data hazards if the register file supports double-pumping (write-then-read)? __ B01 __ hazard(s)
B01 = ?

How many stalls are required to resolve the data hazards if the register file supports double-pumping (write-then-read)? What about the control hazards if branch prediction is used? Assuming the ability to read and write to the register file in the same cycle:

Data Hazards:
- __ B02 __ stalls are needed between instructions __ B03 __ and __ B04 __ due to the WB→ID dependency.
- __ B05 __ stalls are required between instructions __ B06 __ and __ B07 __ for the same WB→ID dependency.
- No additional stalls are needed between instructions __ B08 __ and __ B09 __, as the stalls added for instruction 3 already resolve this dependency.
Control Hazards:
- If the branch prediction is correct, __ B10 __ stalls are required.
- If the branch prediction is incorrect, __ B11 __ stalls are needed to flush the pipeline, covering cycles from MEM to one cycle before IF.

B02 = ? B03 = ? … B11 = ?

Problem `C`

We will use the single-cycle CPU datapath shown below.

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

The following control signals may take on for the single cycle datapath.

PCSel:
- 0: The next PC is OldPC + 4.
- 1: The next PC is the ALU result (used for branches, jumps, etc.).
RegWEn:
- 0: The write-back value is not written to the register file.
- 1: The write-back value is written to the register file.
ImmSel:
- 0-4: Used for I, B, S, J, and U type immediates.
- 5-7: Reserved or unused.
BrEq:
- 0: Inputs are not equal.
- 1: Inputs are equal.
BrLt:
- 0: Register rs1 is not less than rs2.
- 1: Register rs1 is less than rs2.
ALUSel:
- Follows the same design convention as referenced in the CS61C course.

Complete the table by specifying the control signals for each instruction based on the datapath.

If a signal's value does not impact the execution of an instruction, use the * (don't care) symbol.
If a signal's value affects the instruction and can vary depending on the program, list all possible values. For example, if a signal can be either 0 or 1, write 0/1.

Instruction	BrEq	BrLT	PCSel	ImmSel	BrUn	ASel	BSel	ALUSel	MemRW	RegWEn	WBSel
`add`	*	*	0 (PC+4)	*	*	0 (Reg)	0 (Reg)	add	0	1	1 (ALU)
`ori`	*	*	0	I	*	0 (Reg)	1 (Imm)	or	0	1	1 (ALU)
`lw`	*	*	C04	I	*	0 (Reg)	1 (Imm)	add	0	C11	0 (MEM)
`sw`	*	*	C05	S	*	0 (Reg)	1 (Imm)	add	1	C12	*
`beq`	C01	*	C06	B	C09	1 (PC)	1 (Imm)	add	0	C13	*
`jal`	*	C02	C07	J	*	1 (PC)	1 (Imm)	add	0	C14	C16
`blt`	*	C03	C08	B	C10	1 (PC)	1 (Imm)	add	0	C15	*

C01 = ? C02 = ? … C16 = ?

Problem `D`

We will implement the Fast Fourier Transform (FFT) and its Inverse (IFFT) using RISC-V instructions (RVV included). The task involves translating C code into RISC-V assembly while adhering to the specified constraints and guidelines.

Part 1: bitwise reverse

/* Calculates the log2 of number */
int logint(int N)
{
    int k = N, i = 0;
    while (k) {
        k >>= 1;
        i++;
    }
    return i - 1;
}

/* Bitwise reverses the number */
int reverse(int N, int n)
{
    int j, p = 0;
    for (j = 1; j <= logint(N); j++) {
        if (n & (1 << (logint(N) - j)))
            p |= 1 << (j - 1);
    }
    return p;
}

Convert to RISC-V assembly:

# Takes input N (a0), returns its log base 2 in a0
logint:
    addi sp, sp, -4
    sw t0, 0(sp)

    add t0, a0, zero # k = N
    add a0, zero, zero # i = 0

logloop:
    beq t0, zero, logloop_end # Exit if k == 0
    srai t0, t0, 1 # k >>= 1
    addi a0, a0, 1 # i++
    j logloop

logloop_end:
    addi a0, a0, -1 # Return i - 1
    lw t0, 0(sp)
    addi sp, sp, 4
    jr ra

# Takes inputs N(a0) and n(a1), reverses the number in binary
reverse:
    addi sp, sp, -28
    sw ra, 0(sp)
    sw s0, 4(sp)
    sw s1, 8(sp)
    sw s2, 12(sp)
    sw s3, 16(sp)
    sw s4, 20(sp)
    sw s5, 24(sp)

    call logint # Now a0 has log2(N)
    addi s0, zero, 1 # j = 1
    add s1, zero, zero # p = 0

forloop_reverse:
    bgt s0, a0, forloop_reverse_end

    sub s2, a0, s0 # s2 = a0 - s0
    addis3, zero, 1
    __ D01 __ # Fill your answer!
    __ D02 __ # Fill your answer! Check if bit is set
    beq s3, zero, elses3 # If not, skip

ifs3:
    addi s4, s0, -1 # s4 = j - 1
    addi s5, zero, 1
    __ D03 __ # Fill your answer!
    __ D04 __ # Fill your answer! Set bit in p

elses3:
    __ D05 __ # Fill your answer!
    j forloop_reverse

forloop_reverse_end:
    add a0, s1, zero # Return p

    lw ra, 0(sp)
    lw s0, 4(sp)
    lw s1, 8(sp)
    lw s2, 12(sp)
    lw s3, 16(sp)
    lw s4, 20(sp)
    lw s5, 24(sp)
    addi sp, sp, 28
    jr ra

Complete the above RISC-V assembly code to ensure functionality without using any pseudo-instructions. All registers should start with s.

D01 = ? D02 = ? … D05 = ?

You are now allowed to use the RISC-V BitManip extension. One of its supported instructions is clz rd, rs, which counts the number of leading zero bits in the most significant bit (MSB) portion of the source register rs and stores the result in rd.

Reference: RISC-V BitManip

Consider the following RISC-V function set_log2n: It computes log2(N) for a 32-bit unsigned integer stored in a0 and writes the result to the memory location labeled logsize.

Inputs:

a0: 32-bit unsigned integer N

Outputs:

None (result stored in memory at logsize).

Registers Modified:

t0, t1

set_log2n:
    clz t0, a0 # Count leading zeros in a0
    li t1, 31 # Maximum index for a 32-bit register
    __ D06 __ # Fill your answer! Compute log2(N)
    la t0, logsize # Load address of 'logsize'
    __ D07 __ # Fill your answer!

    jr ra # Return to caller

logsize: .word 0

The set_log2n function can be used to calculate and store log2(size) for shared access by other functions.

Complete the above RISC-V assembly code to ensure functionality without using any pseudo-instructions. All registers should start with t.

D06 = ?
D07 = ?

Let's implement the previously described reverse routine as follows without branches:

# Function: reverse
# Reverses the binary digits of a 32-bit integer.
# Inputs:
# - a0: Input number to reverse.
# - a1: Number of significant bits to reverse (optional; default 32).
# Outputs:
# - a0: The reversed binary number.
# Clobbers:
# - t0, t1, t2

reverse:
    # Swap odd and even bits
    lui t0, 0x55555 # Load upper 20 bits of 0x55555555
    ori t0, t0, 0x555 # Complete loading 0x55555555
    srli t1, a0, 1 # v >> 1
    and t1, t1, t0 # (v >> 1) & 0x55555555
    __ D08 __ # Fill your answer!
    __ D09 __ # Fill your answer!
    or a0, t1, t2 # Store result in a0

    # Swap consecutive pairs
    lui t0, 0x33333 # Load upper 20 bits of 0x33333333
    ori t0, t0, 0x333 # Complete loading 0x33333333
    srli t1, a0, 2 # v >> 2
    and t1, t1, t0 # (v >> 2) & 0x33333333
    __ D10 __ # Fill your answer!
    __ D11 __ # Fill your answer!
    or a0, t1, t2 # Store result in a0

    # Swap nibbles
    lui t0, 0x0F0F0 # Load upper 20 bits of 0x0F0F0F0F
    ori t0, t0, 0x0F0 # Complete loading 0x0F0F0F0F
    __ D12 __ # Fill your answer!
    and t1, t1, t0
    __ D13 __ # Fill your answer!
    __ D14 __ # Fill your answer!
    or a0, t1, t2 # Store result in a0

    # Swap bytes
    lui t0, 0x00FF0 # Load upper 20 bits of 0x00FF00FF
    ori t0, t0, 0x0FF # Complete loading 0x00FF00FF
    __ D15 __ # Fill your answer!
    and t1, t1, t0
    __ D16 __ # Fill your answer!
    __ D17 __ # Fill your answer!
    or a0, t1, t2 # Store result in a0

    # Swap 2-byte pairs
    srli t1, a0, __ D18 __
    slli t2, a0, __ D19 __
    or a0, t1, t2 # Final result in a0

    # Adjust number of significant bits
    addi t1, zero, 32 # Load immediate value 32
    sub t1, t1, a1 # Calculate shift amount
    srl a0, a0, t1 # Shift right to fit bit count

    jr ra # Return with result in a0

Complete the above RISC-V assembly code to ensure functionality without using any pseudo-instructions.

D08 = ? D09 = ? … D19 = ?

Next, we will utilize the RISC-V Vector Extension (RVV). Consider the following code:

# Initialize the helper vector with sequential integers (0,1,2,...)
init_helper_vector:
    la t0, helperVector # Load the base address of helperVector
    lw t1, size # Load the size of the vector (N)
    vsetv li t2, t1, e32, m1 # Set the vector length configuration
    vid.v v0 # Generate a vector of indices
    vse32.v v0, (t0) # Store the generated vector in helperVector
    ret

The above should be built with --march=rv32i_zbb_zve32x using GNU Toolchain.

Extension	Minimum VLEN	Supported EEW	FP32	FP64
`zve32x`	32	8, 16, 32	N	N

Reference: RISC-V "V" Vector Extension

We can implement a vectorized version of the reverse routine, vreverse, using the RISC-V Vector Extension as follows.

# Calculate log base 2 of input
# Input:a0 = N
# Output: a0 = log2(N)
logint:
    clz a0, a0 # Count leading zeros of a0
    addi t0, zero, 31 # Load 31 (32-bit word size - 1)
    __ D20 __ # Fill your answer!
    jr ra # Return

# Bit-reverse elements of a vector
# Input:v29 = input vector, a0 = N
# Output: v29 = bit-reversed vector
vreverse:
    addi sp, sp, -4 # Save return address
    sw ra, 0(sp)

    call logint # Compute log2(N) in a0

    lw ra, 0(sp) # Restore return address
    addi sp, sp, 4

    li t1, 1 # Constant 1 for bit manipulation
    li t0, 1 # Initialize bit position counter j
    vmv.v.x v28, zero # Clear result vector v28

vreverse_loop: # Loop for j <= log2(N)
    bgt t0, a0, vreverse_end # Break if j > log2(N)

    sub t2, a0, t0 # t2 = log2(N) - j
    sll t3, t1, t2 # t3 = (1 << (log2(N) - j))

    __ D21 __ # Fill your answer! v0 = v29 & t3 (check bit)
    vmsne.vx v0, v0, zero # v0 = 1 if set, else 0

    addi t4, t0, -1 # t4 = j - 1
    sll t3, t1, t4# t3 = (1 << (j - 1))

    __ D22 __ # Fill your answer! Set bit in v28 where mask is true

    addi t0, t0, 1 # Increment bit position
j vreverse_loop

vreverse_end:
    __ D23 __ # Fill your answer! Store result in v29
    jr ra# Return

The usage of vreverse would look something like this:

    mv a0, a2 # a0 = N, input for reverse function. Overwriting a0 is fine
    vmv.v.v v29, v26 # v29 = v26 = <i>, used as input for vreverse

    call vreverse # v29 now contains rev(N, <i>), keep it for later use

Complete the above RISC-V assembly code to ensure functionality.

D20 = ? D21 = ? .. D23 = ?

Problem `E`

A processor uses 16-bit addresses, 256-byte pages, and an 8-entry fully associative TLB with LRU replacement. The LRU field is represented by 3 bits, indicating the order of page access, where 0 means the most recently accessed page. At a given point in time, the TLB for the current process is in the initial state, as described in the table below. Additionally, three free physical pages are available.

Assumptions:

All current page table entries are already present in the initial TLB.
All pages are readable and writable.

Your task:

Update the TLB's final state based on the given access pattern, adjusting entries as needed.
Write the corresponding physical addresses for each accessed memory location after the TLB update.

Available free physical pages:

0x17
0x18
0x19

Memory access pattern:

Read from virtual address 0x11F0
Write to virtual address 0x1301
Write to virtual address 0x20AE
Write to virtual address 0x2332
Read from virtual address 0x20FF
Write to virtual address 0x3415

Initial TLB:

VPN	PPN	Valid	Dirty	LRU
0x01	0x11	1	1	0
0x00	0x00	0	0	7
0x10	0x13	1	1	1
0x20	0x12	1	0	5
0x00	0x00	0	0	7
0x11	0x14	1	0	4
0xac	0x15	1	1	2
0xff	0xff	1	0	3

Final TLB: (Use hexadecimal literal for address, 0/1 for dirty)

VPN	PPN	Valid	Dirty
0x01	0x11	1	1
E01	E02	1	E03
E04	E05	1	E06
E07	E08	1	E09
E10	E11	1	E12
E13	E14	1	E15
E16	E17	1	E18
E19	E20	1	E21

E01 = ? E02 = ? … E21 = ?

Problem `F`

Consider a memory address space divided into 64-byte segments. For example, address 1024 is 64 bytes away from address 1025 but only 1 byte away from the address

1024 + \frac{1}{64}

. In this system, standard integer binary addressing is ineffective. Instead, consider using floating-point representation based on the IEEE-754 standard, which includes 1 sign bit. You need to determine the appropriate number of exponent and mantissa bits.

Given a memory size of 4KiB and chunk addresses labeled as 0, 1, 2, …, determine the minimum number of exponent bits required in the floating-point memory address representation to access every byte, assuming a standard bias.

F01 = ?

Determine the minimum number of mantissa bits required in the floating-point memory address system to accurately address each byte.

F02 = ?

Problem `G`

In many applications, it is essential not only to identify the maximum value in an array but also to determine its corresponding index, commonly referred to as the argmax. This task can be efficiently accomplished using Data Level Parallelism. The argmax function below accepts an array arr and its length n, returning the index of the maximum element. If multiple elements have the same maximum value, the function returns the earliest index.

To implement this efficiently, use pseudo-SIMD (Single Instruction, Multiple Data) intrinsics. These intrinsics operate on vec structs, similar to Intel's __m128i structs, which contain four packed integers. Complete the sections labeled G01, G02, and G03 in the function to ensure it works correctly.

Proposed SIMD Intrinsics reference:

vec sum_epi32(vec a, vec b) : Returns a + b.
vec and_epi32(vec a, vec b) : Returns a & b.
vec set_epi32(int a) : Returns a SIMD vector with all entries set to a.
vec load_epi32(int *a) : Returns a SIMD vector with entries a[0], a[1], a[2], and a[3].
int reducemax_epi32(vec a) : Returns the maximum value in vector a.
vec maskeq_epi32(vec a, int b) : Returns a mask vector with 0xFFFFFFFF where a equals b, and 0 otherwise.
int firstv_epi32(vec a) : Returns the index of the first entry with its lowest bit set to 1.

int argmax(int *arr, int n)
{
    int curr, index = 0, running_max = -2147483648; /* -2^31 */

    for (int i = 0; i < n / 4 * 4; i += 4) {
        vec tmp = load_epi32(arr + i);
        curr = reducemax_epi32(tmp);
        if (curr > running_max) {
            tmp = G01;/* Fill in your code here */
            index = i + G02; /* Fill in your code here */
            running_max = curr;
        }
    }

    for (int i = n / 4 * 4; i < n; i += 1) {
        if (G03) {/* Fill in your code here */
            running_max = arr[i];
            index = i;
        }
    }

    return index;
}

Complete the above C code to ensure functionality.

G01 = ?
G02 = ?
G03 = ?

Problem `H`

Suppose we are designing a new microprocessor with a custom ISA. To simplify the architecture, we decide to use a single, universal instruction type called the Z-type instruction.

We aim to support the following instructions:

add rd1, rs1, rs2
and rd1, rs1, rs2
lw rd1, offset1 (rs1)
sw rs2, offset1 (rs1)
addi rd1, rs1, imm1
beq rs1, rs2, offset1
lui rd1, offset1
jal rd1, imm
stw rs3, offset1, offset2 (rs1)

The new stw instruction stores the contents of rs3 into both memory locations rs1 + offset1 and rs1 + offset2. Its Register Transfer Level (RTL) is:

Mem (R [r s 1] + offset1) \leftarrow R [r s 3] AND Mem (R [r s 1] + offset2) \leftarrow R [r s 3]

How many bits must the opcode field have if we want to support the instructions listed above? __ H01 __ bits
H01 = ?

With only one instruction format, we want to enable jumps of up to 64 KiB in either direction. How many bits are required for the immediate field to support this? Assume that, like RV32, the instruction's least significant bit is an implicit 0 and not saved.
H02 = ?

Consider the following instruction format:

imm2 imm1 rs3 rs2 rs1 rd2 rd1 opcode

The instruction format is 64 bits long, where each immediate field (imm1 and imm2) is 11 bits, and the opcode is 7 bits. What is the maximum number of registers the ISA can support?
H03 = ?

Problem `I`

You are tasked with designing a memory hierarchy using an 8-word (32-byte) cache line and 32-bit virtual and physical addresses. Consider the following loops:

Loop A:

int sum = 0;
for (int i = 0; i < (1 << 8); i++)
for (int j = 0; j < (1 << 6); j++)
sum += X[i][j] * Y[i][j];

Loop B:

int sum = 0;
for (int j = 0; j < (1 << 6); j++)
for (int i = 0; i < (1 << 8); i++)
sum += X[i][j] * Y[i][j];

Assumptions:

X[0][0] is stored at address 0x0.
Both matrices X and Y are stored contiguously in row-major order, meaning X[i][j] is adjacent to X[i][j + 1] in memory, and X[256][64] is next to Y[0][0].
sum is not stored in the cache.
Caches are initially empty, and all matrix elements are 32-bit integers.

Assume a 32 KiB 4-way set-associative data cache (D$) with a FIFO replacement policy. Complete the following table:

Loop	Compulsory Misses	Conflict and Capacity Misses	Total Misses
Loop A	I01	I02	I03
Loop B	I04	I05	I06

I01 = ? I02 = ? … I06 = ?

Assume a 256 KiB 8-way set-associative L2 cache with a FIFO replacement policy, including the L1 (D$). How many L2 cache misses occur when Loops A and B are run? Loop A: __ I07 __ / Loop B: __ I08 __

I07 = ?
I08 = ?

Given the following hit times:

L1 hit time: 4 cycles
L2 hit time: 12 cycles
DRAM access time: 50 cycles

Calculate the AMAT for Loops A and B: Loop A: __ I09 __ cycles / Loop B: __ I10 __ cycles

I09 = ?

I10 = ?

Assume a Virtually-Indexed Physically-Tagged (VIPT) cache with a 4 KiB page size and a 32 KiB cache size. Is it possible to have virtual address aliasing? __ I11 __ (answer in Yes or No) What is the minimum set-associativity required to avoid aliasing? __ I12 __

I11 = ?
I12 = ?

Quiz5 of Computer Architecture (2024 Fall)

Problem A

Problem B

Problem C

Problem D

Problem E

Problem F

Problem G

Problem H

Problem I