Quiz3 of Computer Architecture (2022 Fall)

Solutions

Problem `A`

You may use the strlen function to count the length of a string in the C programming language. For this function to be implemented in RISC-V, please fill out the following.

Pseduoinstructions MUST NOT be used.

    addi s1, x0, 1
    jal strlen
    add a0, x0, s1
    ecall
strlen:
    A01 # Fill your answer
    sw s1, 0(sp)
    mv a1, x0
    la s1, str
loop:
    A02 # Fill your answer
    A03 # Fill your answer
    addi a1, a1, 1
    addi s1, s1, 1
    j loop
epilogue:
    lw s1, 0(sp)
    A04 # Fill your answer
    jr ra

A01 = ?
addi sp, sp, -4

A02 = ?
lbu t0, 0(s1)

A03 = ?
beq t0, x0, epilogue

A04 = ?
addi sp, sp, 4

Problem `B`

Write a RISC-V function that returns 0 if the 32-bit float input is an infinite value and a non-zero value otherwise. As usual, a0 will be used to hold the input and output. Pseduoinstructions MUST NOT be used.

is_not_inf:
    B01 # Fill your answer
    addi a1, a1, -1
    and a0, a0, a1
    B02 # Fill your answer
    B03 # Fill your answer
    ret # Return instruction

B01 = ?
lui a1, 0x80000

B02 = ?
lui a1, 0x7F800

B03 = ?
xor a0, a0, a1

Problem `C`

The following circuit has a delay of 3 ns for NOT gates, 7 ns for AND and OR gates, and 8 ns for the "Black Box" logic component. The registers have setup times of 5ns and a clk-to-q latency of 6 ns.

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

What is the maximum allowable hold time of the registers? __ C01 __ ns
- C01 = ?
  27
  The shortest path through the circuit to a register clearly follows the path from A to O and includes: clk-to-q delay, two NOT gates, one OR gate, and the "Black Box."" Maximum hold time =
  $6 + 2 \times 3 + 7 + 8$ = 27 ns
What is the minimum acceptable clock period for this circuit? __ C02 __ ns
- C02 = ?
  46
  The period is determined by the longest path and includes: clk-to-q delay, two NOT gates, three OR gates (or two OR gates and one AND gate), the "Black Box," and the setup time. Minimum period =
  $6 + 2 * 3 + 3 \times 7 + 8 + 5$ = 46ns

Problem `D`

The circuit seen below can be made simpler. Please write the circuit's origin boolean expression and then simplify it using the fewest possible two-input logic gates. __ D01 __

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

D01 = ?
A + B + C

Image Not Showing Possible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
Learn More →

One possible solution for the circuit:

Image Not Showing Possible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
Learn More →

Problem `E`

All logic gates in the circuit below have a delay of 5 ns, RegC has a setup time of 6 ns, and RegA and RegB have setup, hold, and clk-to-q durations of 4 ns.

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

What is the maximum allowable hold time for RegC? __ E01 __ ns
- E01 = ?
  14
  The maximum allowable hold time for RegC is how long it takes for RegC’s input to change, so (clk-to-q of A or B) + shortest CL time = 4 + (5 + 5) = 14ns.
What is the minimum acceptable clock cycle time for this circuit (E02 ns), and clock frequency does it correspond to (E03 MHz)?
- E02 = ?
  25
  The minimum acceptable clock cycle time is clk-to-q + longest CL time + setup time = 4 + (5 + 5 + 5) + 6 = 25ns.
- E03 = ?
  40
  25ns corresponds to a clock frequency of
  $\frac{1}{25 \times 10^{- 9} s}$ = 40MHz.

Problem `F`

Look at the RISC-V function below, whose start address is in a0 and length is in a1, which sorts a word array in-place.

sort:
    # prologue
    mv s0, a0
    0x00259493 # p1
    addi s1, s1, -4
    add s1, s1, s0
    mv t0, s0
outer_loop:
    mv t1, t0
    0x00032383 # p2
    mv t3, t1
inner_loop:
    addi t1, t1, 4
    lw t4, 0(t1)
    ble t4, t2, mystery_label
    mv t2, t4
    mv t3, t1
mystery_label:
    blt t1, s1, inner_loop
    lw t5, 0(t0)
    sw t2, 0(t0)
    sw t5, 0(t3)
    addi t0, t0, 4
    blt t0, s1, outer_loop
    # epilogue
    ret

Please disassemble machine code marked #p1 and #p2 above to RISC-V instructions. (Please use register names, e.g., s0, s1, etc., NOT x8, x9, etc.)
- 0x00259493 # p1: __ F01 __
- F01 = ?
  slli s1, a1, 2
- 0x00032383 # p2: __ F02 __
- F02 = ?
  lw t2, 0(t1)
li x1, 0xDCBAABCD is also a pseudo instruction. Below it is expanded to 2 normal instructions. Please fill in the blanks and translate each of them to machine code.

Assembly Machine code

lui x1, 0x __ F03 __ 0x __ F04 __

addi x1, __ F05 __ , __ F06 __ 0x __ F07
- F03 = ?
  DCBAB
- F04 = ?
  DCBAB0B7
- F05 = ?
  x1 OR ra
- F06 = ?
  -1075
- F07 = ?
  BCD08093

Assembly	Machine code
lui x1, 0x __ F03 __	0x __ F04 __
addi x1, __ F05 __ , __ F06 __	0x __ F07

Problem `G`

Assume that all input originates from a register and that no hold time laws have been broken. What is the highest frequency at which you can run this circuit's clock such that it operates properly? __ G01 __

Using these variables, formulate your response as a mathematical expression (you may also use min(), max(), abs(), and other simple operations if necessary):

X = XOR delay
N = NOT delay
C = t_clk−to−Q
S = t_setup
H = t_hold

G01 = ?

$\frac{1}{C + S + m a x (X, N)}$
We want to check the longest path in the circuit.
There are three paths to consider:

Input→XOR→Register

Register→XOR→Register

Register→NOT→Register

Since input comes from a register, so (1) and (2) are equivalent.
The time it will take (2) to execute properly is t_clk−to−Q + XOR delay + t_setup.
The time it will take (3) to execute properly is t_clk−to−Q + NOT delay + t_setup.
We do not know which number is bigger between X and N, so we take the maximum.

$\frac{1}{m a x (C + X + S, C + N + S)} = \frac{1}{C + S + m a x (X, N)}$

Problem `H`

The fabs instruction in the x86 architecture returns a floating-point number's absolute value. Compared to utilizing branches, this command is quicker.

Complete the following C code that mimics this instruction without ternary operators or if-else statements. Assume that int and float are of equal size. (The x86 architecture makes use of the IEEE 754).
```
float fabs(float x) {
    int tmp = *(int *) &x;
    tmp &= H01 /* Your code here */;
    return H02 /* Your code here */;
}
```
- H01 = ?
  0x7fffffff
- H02 = ?
  *(float *) &tmp
Can we use the same method to get the absolute value of a single-precision 32-bit HFP number? __ H03 __ (Yes or No) Why? __ H04 __
- H03 = ?
  Yes
- H04 = ?
  Because both of the formats use the most significant bit as sign bit.

Problem `J`

We wish to develop a new ISA while we work on a new processor for an embedded application. We choose to include only one, the X-type instruction, because we are tired of the several RISC-V instruction types. Let's say we want to include the instructions below:

add rd1, rs1, rs2
and rd1, rs1, rs2
lw rd1, offset1 (rs1)
sw rs2, offset1 (rs1)
addi rd1, rs1, imm1
beq rs1, rs2, offset1
lui rd1, offset1
jal rd1, imm
stw rs3, offset1, offset2 (rs1)

The new stw instruction stores the contents of rs3 into both rs1 + offset1 and rs1 + offset2.

We want to do away with the funct3 and funct7 fields and only use an opcode. If we only wish to support the instructions listed above, what is the minimum number of bits the opcode field can be? __ J01 __ bit(s)
- J01 = ?
  4
  We have 9 instructions, so we need 4 bits to represent them.
We want to be able to jump up to 64 KiB in either direction with a single instruction. How many bits are necessary to encode an immediate that would allow us to do this? __ J02 __ bit(s) Assume that, just like RV32, the least significant bit is an implicit 0 and is not stored in the instruction.
- J02 = ?
  16
  64 KiB = 2¹⁶ B. jal is the only jump instruction given, so it will determine the size of the immediate field. Recall that, in RISC-V, the immediates encoded in instructions work in units of half-instructions, so jumping up to 64 KiB in either direction is the same as saying we want to jump up to 2¹⁵ half instructions in both directions. Since immediates are signed, we need 16 bits to represent this range of values.
Then, we switch to a 32-bit machine, and finalize our instruction format to have 4 bits for each of the immediate fields, 4 bits for each register, and 4 bits for the opcode. Convert the instruction stw x8, 0, 4 (x5) into machine code. Leave your answer in binary (don’t forget the prefix!). If a field is not used, fill in the field with xs. __ J03 __
- J03 = ?
  010000001000xxxx0101xxxxxxxx1000
  rs3 is x8 = 0b1000. rs1 is x5 = 0b0101. Imm1 = 0b0000. Imm2 = 0b0100. So, the final instruction is
```
0100 | 0000 | 1000 | xxxx | 0101 | xxxx | xxxx | 1000.
```

Problem `K`

A bloom filter is a very clever data structure for effectively and probabilistically storing a set. It does two functions: insert and check. The fundamental principle of checking is to hash your search term many times. Each hash identifies a specific bit that has to be set or checked. So, to verify, you check to see if the bit is set. You do this repeatedly, with each iteration's count of repetitions being included in the hash (so each hash is different). The element does not exist in the bloom filter if not all bits are set. The element PROBABLY occurs in the bloom filter if all bits are set. Similar to that, you just set an element as present when using a bloom filter.

Similar to that, you just set all those bits to 1 to indicate that an element is present in a bloom filter.
A flexible and portable bloom filter design is what we are aiming for. Therefore, we create the structure below.

struct BloomFilter {
    uint32_t size; /* Size is # of bits, NOT BYTES, in the bloom filter */
    uint16_t itercount;
    uint64_t (*)(void *data, uint16t iter) hash;
    uint8_t *data;
};

On a 32-bit architecture that requires word alignment for 32-bit integers and pointers, what is sizeof(struct BloomFilter)? __ K01 __ bytes
- K01 = ?
  16
  uint32_t size takes up 32 bits = 4 bytes. Call these bytes 0-3 of the struct.
  uint16_t itercount takes up 16 bits = 2 bytes. Call these bytes 4-5 of the struct.
  The next field in the struct (hash) is a function pointer, and the question says we need word alignment for pointers. Word alignment means that the address of the struct field must be a multiple of the word size, which is 4 bytes on a 32-bit system.
  Because we need the next field to be word-aligned, we add 2 bytes of padding. Call these bytes 6-7 of the struct. Now the next field will start at byte 8, which is a multiple of 4.
  hash is a function pointer, which takes up 4 bytes. Call these bytes 8-11 of the struct.
  uint8 t *data is a pointer, which takes up 4 bytes. Call these bytes 12-15 of the struct.
  In total, we have bytes 0-15 of the struct, which is 16 bytes.
And now we have the insert function. For this we need to set the appropriate bit for each iteration.
```
void insert(struct BloomFilter *b, void *element)
{
    uint64_t bitnum; /* which bit we need to set */
    for (int i = 0; i < (K02 /* Your code here */); ++i) {
        bitnum = b->hash(element, (uint16 t) i) % b->size;
        b->data[bitnum >> 3] |= 1 << (K03 /* Your code here */);
    }
}
```
- K02 = ?
  b->itercount
- K03 = ?
  b->size
  we need to hash the element we’re trying to insert. The question also says that the hash needs to include the iteration count. The struct has a hash field, which we can infer is a pointer to the hash function we need to compute. To access the function pointer, we need to use the arrow syntax, i.e. b->hash.
  Recall that in C, when you have an expression for a function pointer, you can directly call the function pointer. For example, if x is a function pointer, then x(3) will call the corresponding function with argument 3.
  Recall that in C, the syntax for function pointers lists the return value(s) and then the argument(s). In this case, we have uint64_t (*)(void *data, uint16 t iter) hash which says that the function takes in arguments (void *data, uint16_t iter) and returns a uint64_t. In this case, we want to call the hash function with the element (the argument to insert) and the iteration count (we found this in the previous subpart).
  The call to the hash looks like this: b->hash(element, i). Casting i from an integer to a uint16_t is good practice (since the argument to the hash function is a uint16_t) but probably not strictly necessary.
  Finally, the question says that "each hash tells you a particular bit you need to set or check,"" but there is no guarantee that the hash output is lower than the number of bits in the Bloom filter. If the hash output is greater than the number of bits in the Bloom filter, then bitnum will be too large, and the next line will index out-of-bounds.
  Therefore, we should use the mod operator on the hash output to ensure that the value in bitnum is not greater than the size of the Bloom filter (b->size).
- K04 = ? (You MUST use a bitwise AND/OR/XOR operator)
  bitnum & 0x7
  The left side of this expression is b->data[bitnum >> 3]. Note that bitnum >> 3 divides bitnum by 8. bitnum/8 tells us which byte of the data array contains the bit we want to set. We divide by 8 because each byte is 8 bits, and we want to convert from the index measured in bits to the index measured in bytes. Then, we get this byte of the data array by indexing into the data array, which is an array of bytes (each uint8_t is 1 byte).
  bitnum is the index of the bit we want to set to 1, measured in bits. bitnum >> 3 identifies which byte of the data array contains the bit we want to set.
  b->data[bitnum >> 3] contains 8 bits from the data array. We want to set one of these bits to 1.

Quiz3 of Computer Architecture (2022 Fall)

Problem A

Problem B

Problem C

Problem D

Problem E

Problem F

Problem G

Problem H

Problem J

Problem K

Read more

你所不知道的 C 語言：數值系統

建構 User-Mode Linux 的實驗環境

Linux 核心設計: Scheduler(8): Energy Aware Scheduling

你所不知道的 C 語言: linked list 和非連續記憶體

Problem `A`

Problem `B`

Problem `C`

Problem `D`

Problem `E`

Problem `F`

Problem `G`

Problem `H`

Problem `J`

Problem `K`