--- tags: computer-arch --- # Quiz3 of Computer Architecture (2022 Fall) > Solutions ## Problem `A` You may use the [strlen](https://man7.org/linux/man-pages/man3/strlen.3.html) function to count the length of a string in the C programming language. For this function to be implemented in RISC-V, please fill out the following. Pseduoinstructions MUST NOT be used. ```c addi s1, x0, 1 jal strlen add a0, x0, s1 ecall strlen: A01 # Fill your answer sw s1, 0(sp) mv a1, x0 la s1, str loop: A02 # Fill your answer A03 # Fill your answer addi a1, a1, 1 addi s1, s1, 1 j loop epilogue: lw s1, 0(sp) A04 # Fill your answer jr ra ``` > * A01 = ? > ==`addi sp, sp, -4`== > * A02 = ? > ==`lbu t0, 0(s1)`== > * A03 = ? > ==`beq t0, x0, epilogue`== > * A04 = ? > ==`addi sp, sp, 4`== --- ## Problem `B` Write a RISC-V function that returns `0` if the 32-bit float input is an infinite value and a non-zero value otherwise. As usual, `a0` will be used to hold the input and output. Pseduoinstructions MUST NOT be used. ```c is_not_inf: B01 # Fill your answer addi a1, a1, -1 and a0, a0, a1 B02 # Fill your answer B03 # Fill your answer ret # Return instruction ``` > * B01 = ? > ==`lui a1, 0x80000`== > * B02 = ? > ==`lui a1, 0x7F800`== > * B03 = ? > ==`xor a0, a0, a1`== --- ## Problem `C` The following circuit has a delay of 3 ns for NOT gates, 7 ns for AND and OR gates, and 8 ns for the "Black Box" logic component. The registers have setup times of 5ns and a clk-to-q latency of 6 ns. ![](https://hackmd.io/_uploads/SkNkEDV4j.png) 1. What is the maximum allowable hold time of the registers? __ C01 __ ns > * C01 = ? > ==27== > The shortest path through the circuit to a register clearly follows the path from A to O and includes: clk-to-q delay, two NOT gates, one OR gate, and the "Black Box."" Maximum hold time = $6 + 2 \times 3 + 7 + 8$ = 27 ns 2. What is the minimum acceptable clock period for this circuit? __ C02 __ ns > * C02 = ? > ==46== > The period is determined by the longest path and includes: clk-to-q delay, two NOT gates, three OR gates (or two OR gates and one AND gate), the "Black Box," and the setup time. Minimum period = $6 + 2*3 + 3 \times 7 + 8 + 5$ = 46ns --- ## Problem `D` The circuit seen below can be made simpler. Please write the circuit's origin boolean expression and then simplify it using the fewest possible two-input logic gates. __ D01 __ ![](https://hackmd.io/_uploads/S1M7BPN4s.png) > * D01 = ? > ==`A + B + C`== > ![](https://hackmd.io/_uploads/r1mwHwE4i.png) > One possible solution for the circuit: > ![](https://hackmd.io/_uploads/BkwqBwVVs.png) --- ## Problem `E` All logic gates in the circuit below have a delay of 5 ns, RegC has a setup time of 6 ns, and RegA and RegB have setup, hold, and clk-to-q durations of 4 ns. ![](https://hackmd.io/_uploads/rkHtnvV4i.png) 1. What is the maximum allowable hold time for RegC? __ E01 __ ns > * E01 = ? > ==14== > The maximum allowable hold time for RegC is how long it takes for RegC’s input to change, so (clk-to-q of A or B) + shortest CL time = 4 + (5 + 5) = 14ns. 2. What is the minimum acceptable clock cycle time for this circuit (E02 ns), and clock frequency does it correspond to (E03 MHz)? > * E02 = ? > ==25== > The minimum acceptable clock cycle time is clk-to-q + longest CL time + setup time = 4 + (5 + 5 + 5) + 6 = 25ns. > * E03 = ? > ==40== > 25ns corresponds to a clock frequency of $\frac{1}{25 \times 10^{−9}s}$ = 40MHz. --- ## Problem `F` Look at the RISC-V function below, whose start address is in `a0` and length is in a1, which sorts a word array in-place. ```c sort: # prologue mv s0, a0 0x00259493 # p1 addi s1, s1, -4 add s1, s1, s0 mv t0, s0 outer_loop: mv t1, t0 0x00032383 # p2 mv t3, t1 inner_loop: addi t1, t1, 4 lw t4, 0(t1) ble t4, t2, mystery_label mv t2, t4 mv t3, t1 mystery_label: blt t1, s1, inner_loop lw t5, 0(t0) sw t2, 0(t0) sw t5, 0(t3) addi t0, t0, 4 blt t0, s1, outer_loop # epilogue ret ``` 1. Please disassemble machine code marked `#p1` and `#p2` above to RISC-V instructions. (Please use register names, e.g., `s0`, `s1`, etc., NOT `x8`, `x9`, etc.) * `0x00259493 # p1`: __ F01 __ > * F01 = ? > ==`slli s1, a1, 2`== * `0x00032383 # p2`: __ F02 __ > * F02 = ? > ==`lw t2, 0(t1)`== 2. `li x1, 0xDCBAABCD` is also a pseudo instruction. Below it is expanded to 2 normal instructions. Please fill in the blanks and translate each of them to machine code. | Assembly | Machine code | | -------- | -------- | | lui x1, 0x __ F03 __ | 0x __ F04 __ | | addi x1, __ F05 __ , __ F06 __ | 0x __ F07 | > * F03 = ? > ==DCBAB== > * F04 = ? > ==DCBAB0B7== > * F05 = ? > ==x1== OR ==ra== > * F06 = ? > ==`-1075`== > * F07 = ? > ==BCD08093== --- ## Problem `G` Assume that all input originates from a register and that no hold time laws have been broken. What is the highest frequency at which you can run this circuit's clock such that it operates properly? __ G01 __ Using these variables, formulate your response as a mathematical expression (you may also use `min()`, `max()`, `abs()`, and other simple operations if necessary): * X = XOR delay * N = NOT delay * C = t~clk−to−Q~ * S = t~setup~ * H = t~hold~ ![](https://hackmd.io/_uploads/Skq1oO4Vi.png) > * G01 = ? > ==$\frac{1}{C + S + max(X, N)}$== > We want to check the longest path in the circuit. > There are three paths to consider: > 1. Input→XOR→Register > 2. Register→XOR→Register > 3. Register→NOT→Register > > Since input comes from a register, so (1) and (2) are equivalent. > The time it will take (2) to execute properly is t~clk−to−Q~ + XOR delay + t~setup~. > The time it will take (3) to execute properly is t~clk−to−Q~ + NOT delay + t~setup~. > We do not know which number is bigger between X and N, so we take the maximum. > $\frac{1}{max(C + X + S, C + N + S)} = \frac{1}{C + S + max(X, N)}$ --- ## Problem `H` The `fabs` instruction in the x86 architecture returns a floating-point number's absolute value. Compared to utilizing branches, this command is quicker. 1. Complete the following C code that mimics this instruction without ternary operators or `if-else` statements. Assume that int and float are of equal size. (The x86 architecture makes use of the IEEE 754). ```c float fabs(float x) { int tmp = *(int *) &x; tmp &= H01 /* Your code here */; return H02 /* Your code here */; } ``` > * H01 = ? > ==`0x7fffffff`== > * H02 = ? > ==`*(float *) &tmp`== 2. Can we use the same method to get the absolute value of a single-precision 32-bit [HFP number](https://en.wikipedia.org/wiki/IBM_hexadecimal_floating-point)? __ H03 __ (Yes or No) Why? __ H04 __ > * H03 = ? > ==Yes== > * H04 = ? > ==Because both of the formats use the most significant bit as sign bit.== --- ## Problem `J` We wish to develop a new ISA while we work on a new processor for an embedded application. We choose to include only one, the X-type instruction, because we are tired of the several RISC-V instruction types. Let's say we want to include the instructions below: ```c add rd1, rs1, rs2 and rd1, rs1, rs2 lw rd1, offset1 (rs1) sw rs2, offset1 (rs1) addi rd1, rs1, imm1 beq rs1, rs2, offset1 lui rd1, offset1 jal rd1, imm stw rs3, offset1, offset2 (rs1) ``` The new stw instruction stores the contents of rs3 into both `rs1 + offset1` and `rs1 + offset2`. 1. We want to do away with the `funct3` and `funct7` fields and only use an opcode. If we only wish to support the instructions listed above, what is the minimum number of bits the opcode field can be? __ J01 __ bit(s) > * J01 = ? > ==4== > We have 9 instructions, so we need 4 bits to represent them. 2. We want to be able to jump up to 64 KiB in either direction with a single instruction. How many bits are necessary to encode an immediate that would allow us to do this? __ J02 __ bit(s) Assume that, just like RV32, the least significant bit is an implicit 0 and is not stored in the instruction. > * J02 = ? > ==16== > 64 KiB = 2^16^ B. `jal` is the only jump instruction given, so it will determine the size of the immediate field. Recall that, in RISC-V, the immediates encoded in instructions work in units of half-instructions, so jumping up to 64 KiB in either direction is the same as saying we want to jump up to 2^15^ half instructions in both directions. Since immediates are signed, we need 16 bits to represent this range of values. 3. Then, we switch to a 32-bit machine, and finalize our instruction format to have 4 bits for each of the immediate fields, 4 bits for each register, and 4 bits for the opcode. Convert the instruction `stw x8, 0, 4 (x5)` into machine code. Leave your answer in binary (don’t forget the prefix!). If a field is not used, fill in the field with `x`s. __ J03 __ > * J03 = ? > ==`010000001000xxxx0101xxxxxxxx1000`== > rs3 is x8 = 0b1000. rs1 is x5 = 0b0101. Imm1 = 0b0000. Imm2 = 0b0100. So, the final instruction is > ``` > 0100 | 0000 | 1000 | xxxx | 0101 | xxxx | xxxx | 1000. > ``` --- ## Problem `K` A [bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) is a very clever data structure for effectively and probabilistically storing a set. It does two functions: **insert** and **check**. The fundamental principle of checking is to hash your search term many times. Each hash identifies a specific bit that has to be set or checked. So, to verify, you check to see if the bit is set. You do this repeatedly, with each iteration's count of repetitions being included in the hash (so each hash is different). The element does not exist in the bloom filter if not all bits are set. The element PROBABLY occurs in the bloom filter if all bits are set. Similar to that, you just set an element as present when using a bloom filter. Similar to that, you just set all those bits to 1 to indicate that an element is present in a bloom filter. A flexible and portable bloom filter design is what we are aiming for. Therefore, we create the structure below. ```c struct BloomFilter { uint32_t size; /* Size is # of bits, NOT BYTES, in the bloom filter */ uint16_t itercount; uint64_t (*)(void *data, uint16t iter) hash; uint8_t *data; }; ``` 1. On a 32-bit architecture that requires word alignment for 32-bit integers and pointers, what is `sizeof(struct BloomFilter)`? __ K01 __ bytes > * K01 = ? > ==16== > uint32_t size takes up 32 bits = 4 bytes. Call these bytes 0-3 of the struct. > uint16_t itercount takes up 16 bits = 2 bytes. Call these bytes 4-5 of the struct. > The next field in the struct (hash) is a function pointer, and the question says we need word alignment for pointers. Word alignment means that the address of the struct field must be a multiple of the word size, which is 4 bytes on a 32-bit system. > Because we need the next field to be word-aligned, we add 2 bytes of padding. Call these bytes 6-7 of the struct. Now the next field will start at byte 8, which is a multiple of 4. > hash is a function pointer, which takes up 4 bytes. Call these bytes 8-11 of the struct. > uint8 t *data is a pointer, which takes up 4 bytes. Call these bytes 12-15 of the struct. > In total, we have bytes 0-15 of the struct, which is 16 bytes. 2. And now we have the `insert` function. For this we need to set the appropriate bit for each iteration. ```c void insert(struct BloomFilter *b, void *element) { uint64_t bitnum; /* which bit we need to set */ for (int i = 0; i < (K02 /* Your code here */); ++i) { bitnum = b->hash(element, (uint16 t) i) % b->size; b->data[bitnum >> 3] |= 1 << (K03 /* Your code here */); } } ``` > * K02 = ? > ==`b->itercount`== > * K03 = ? > ==`b->size`== > we need to hash the element we’re trying to insert. The question also says that the hash needs to include the iteration count. The struct has a hash field, which we can infer is a pointer to the hash function we need to compute. To access the function pointer, we need to use the arrow syntax, i.e. `b->hash`. > Recall that in C, when you have an expression for a function pointer, you can directly call the function pointer. For example, if x is a function pointer, then x(3) will call the corresponding function with argument 3. > Recall that in C, the syntax for function pointers lists the return value(s) and then the argument(s). In this case, we have `uint64_t (*)(void *data, uint16 t iter) hash` which says that the function takes in arguments (`void *data, uint16_t iter`) and returns a uint64_t. In this case, we want to call the hash function with the element (the argument to insert) and the iteration count (we found this in the previous subpart). > The call to the hash looks like this: `b->hash(element, i)`. Casting i from an integer to a uint16_t is good practice (since the argument to the hash function is a uint16_t) but probably not strictly necessary. > Finally, the question says that "each hash tells you a particular bit you need to set or check,"" but there is no guarantee that the hash output is lower than the number of bits in the Bloom filter. If the hash output is greater than the number of bits in the Bloom filter, then bitnum will be too large, and the next line will index out-of-bounds. > Therefore, we should use the mod operator on the hash output to ensure that the value in bitnum is not greater than the size of the Bloom filter (`b->size`). > * K04 = ? (You MUST use a bitwise AND/OR/XOR operator) > ==`bitnum & 0x7`== > The left side of this expression is `b->data[bitnum >> 3]`. Note that `bitnum >> 3` divides bitnum by 8. bitnum/8 tells us which byte of the data array contains the bit we want to set. We divide by 8 because each byte is 8 bits, and we want to convert from the index measured in bits to the index measured in bytes. Then, we get this byte of the data array by indexing into the data array, which is an array of bytes (each uint8_t is 1 byte). > bitnum is the index of the bit we want to set to 1, measured in bits. `bitnum >> 3` identifies which byte of the data array contains the bit we want to set. > `b->data[bitnum >> 3]` contains 8 bits from the data array. We want to set one of these bits to 1.