---
tags: homework
---
# Quiz3 Annotate and Explain
My annotation is include in info section.
:::info
my annotation
:::
## Problem `A`
We are given an array of $n$ unique `uint32_t` that represent nodes in a directed graph. We say there is an edge between A and B if `A < B` and the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) between A and B is exactly `1`. A [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) of `1` means that the bits differ in `1` (and only 1) place. As an example, if the array were `{0b0000,
0b0001, 0b0010, 0b0011, 0b1000, 0b1010}`, we would have the edges shown as following:

> See also: LeetCode [461. Hamming Distance](https://leetcode.com/problems/hamming-distance/)
Construct an `edgelist_t` (specified below) that contains all of the edges in this graph.
```c
typedef struct { uint32_t A, B; } edge_t;
typedef struct {
edge_t *edges;
int len;
} edgelist_t;
```
Our solution used every line provided, but if you need more lines, just write them to the right of the line they are supposed to go after and put semicolons between
them. All of the necessary `#include` statements are omitted for brevity; do not worry about checking for `malloc`, `calloc`, or `realloc` returning `NULL`. Make sure `L->edges` has no unused space when `L` is eventually returned.
```c
edgelist_t *build_edgelist(uint32_t *nodes, int n)
{
edgelist_t *L = malloc(sizeof(edgelist_t));
L->len = 0;
L->edges = malloc(n * n * sizeof(edge_t));
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
uint32_t tmp = A01;
if ((nodes[i] < nodes[j]) ; !(A02)) {
A03;
A04;
L->len++;
}
}
}
L->edges = realloc(L->edges, sizeof(edge_t) * L->len);
return L;
}
```
> * A01 = ?
==nodes[i] ^ nodes[j]==
:::info
XOR can differentiate the differenet bit from two numbers. Take `111` and `110` for example, `111`^`110`=`001`.
:::
> * A02 = ?
==tmp & (tmp - 1)==
:::warning
或等價的形式
:::
:::info
To check the hamming distance is `1`, the XOR result must contain only 1 `1` bit in the number.
if `tmp&(tmp-1)` is 0 then there is only 1 `1` bit in the number, and if the result is not 0, then there is multiple `1` bit in the number.
Another approach is `__builtin_popcount(tmp) == 1`.
:::
> * A03 = ?
==L->edges[L->len].A = nodes[i]==
> * A04 = ?
==L->edges[L->len].B = nodes[j]==
:::warning
A03 和 A04 可互換
:::
:::info
Since the direct graph edge is from the shorter one to the longer one and line 10 `nodes[i] < nodes[j]`, edgeA is nodes[i] and edgeB is nodes[j]
:::
---
## Problem `B`
Consider the following circuit:

You are given the following information:
- `Clk` has a frequency of 50 MHz
- AND gates have a propagation delay of 2 ns
- NOT gates have a propagation delay of 4 ns
- OR gates have a propagation delay of 10 ns
- `X` changes 10ns after the rising edge of `Clk`
- `Reg1` and `Reg2` have a clock-to-Q delay of 2 ns
:::success
The clock period is $\frac{1}{50 \times 10^6} s = 20 ns$. This means that if `X` changes, it changes 10 ns after the clock positive edge.
:::
1. What is the longest possible setup time such that there are no setup time violations? (Please include ns in your answer.)
> B01 = ?
==4 ns==
> Reg1 longest possible setup time: the path is output of Reg1 -> NOT -> OR, with a delay of 2 ns + 4 ns + 10 ns = 16 ns. So 20 - 16 = 4 ns.
> Reg2 longest possible setup time: the path is X changes -> AND, with a delay of 10 ns + 2 ns = 12 ns. So 20 - 12 = 8 ns.
> So longest setup time: min(4ns, 8ns) = 4ns
:::info
Since during the propagation delay, the X value is garbage, we can not count as stable value, and we do not consider the combinational logic delay(like D-to-Q delay), so `Tclk - propagation delay > setup time`.
For Reg1
Clock period = $\frac{1}{50*10^6}=20ns$
propagation delay: 2+4+10=16
Setup time = 20-16=4ns
For Reg2
Clock period = $\frac{1}{50*10^6}=20ns$
propagation delay: 10+2=12ns
Setup time = 20-12=8ns
Hence, considering the `Tclk - propagation delay > setup time`, we would choose the minimum one, 4ns.
:::
2. What is the longest possible hold time such that there are no hold time violations? (Please include ns in your answer.)
> B02 = ?
==8 ns==
> Reg 1 longest possible hold time: the path is output of Reg2 -> OR, with a delay of 2 ns + 10 ns = 12 ns.
> Reg2 longest possible hold time: the path is output of
Reg2 -> NOT -> AND, with a delay of 2 ns + 4 ns + 2 ns = 8 ns.
> So longest hold time: min(12ns, 8ns) = 8ns
:::info
According to the [reference](https://0xeefromhardware.blogspot.com/2019/05/how-to-calculate-setup-time-and-hold.html),We should consider the propagation delay followed with register. `Thold < Tprop`.
For Reg1
Condisering the output Q,it would have to wait for the OR gate and Reg2, and hence the longest hold time is 2+10=12ns
For Reg2
It would have to wait for NOT->AND->Reg2. so the longest hold time is 8ns.
Considering the rule `Thold < Tprop`, The longest hold time is min(8, 12)
:::
3. Represent the circuit above using an equivalent FSM, shown in the following, where X is the input and Q is the output, with the state labels encoding Reg1Reg2 (e.g., `01` means `Reg1 = 0` and `Reg2 = 1`). We did one transition already.

> * B03 = ?
==10==
:::info
For state ==00== to B03

For Reg1: 0|1=1
For Reg2: 0&1=0
Hence the answer=10
:::
> * B04 = ?
==01==
:::info
For state ==10== to B04

For reg1: 0|0=0
For reg2: 1&1=1
Hence the anwer is ==01==
:::
---
## Problem `C`
What is the FULLY SIMPLIFIED (fewest primitive gates) circuit for the equation below? You may use the following primitive gates: AND, NAND, OR, NOR, XOR, XNOR, and NOT.
$$
\begin{align}
&\phantom{=}\overline{(C + AB \overline C + \overline B \overline C D)} + \overline{(C + \overline{B + D})} & \\
&= C01 \\
\end{align}
$$
> * C01 = ?
==$\overline C$== 或 ==$\overline C$== 或 ==$NOT C$==
$$
\begin{align}
&\phantom{=}\overline{(C + AB \overline C + \overline B \overline C D)} + \overline{(C + \overline{B + D})} & \\
&= \overline{C}\overline{(AB\overline{C})}\overline{(\overline{B}\overline{C}D)}+\overline{C}(B+D) \qquad\text{(Demorgan's)} \\
&= \overline{C}(\overline{A}+\overline{B}+C)(B+C+\overline{D})+\overline{C}(B+D) \qquad\text{(Demorgan's)} \\
&=(\overline{A}\overline{C}+\overline{B}\overline{C}+\overline{C}C)(B+C+\overline{D})+B\overline{C}+\overline{C}D \qquad\text{(Distributive)} \\
&= (\overline{A}\overline{C}+\overline{B}\overline{C})(B+C+\overline{D})+B\overline{C}+\overline{C}D \qquad\text{(Inverse)} \\
&= \overline{A}B\overline{C}+\overline{A}C\overline{C}+\overline{A}\overline{C}\overline{D}+B\overline{B}\overline{C}+BC\overline{C}+\overline{B}\overline{C}\overline{D}+B\overline{C}+\overline{C}D \qquad\text{(Distributive)} \\
&= \overline{A}B\overline{C}+\overline{A}\overline{C}\overline{D}+\overline{B}\overline{C}\overline{D}+B\overline{C}+\overline{C}D \qquad\text{(Inverse} \\
&= B\overline{C}(A+1)+\overline{A}\overline{C}\overline{D}+\overline{B}\overline{C}\overline{D}+\overline{C}D\qquad\text{(Distributive)} \\
&= \overline{C}(B+\overline{A}\overline{D}+\overline{B}\overline{D}+D))\qquad\text{(Distributive)} \\
&= \overline{C}((B+\overline{B}\overline{D})+(\overline{A}\overline{D}+D))\qquad\text{(Associative)} \\
&= \overline{C}(B+\overline{D}+\overline{A}+D) \\
&= \overline{C}(\overline{A}+B+(D+\overline{D})) \qquad\text{(Associative)} \\
&= \overline{C}(\overline{A}+B+1) \qquad\text{(Inverse)} \\
&= \overline{C} \qquad\text{(Identity)} \\
\end{align}
$$
---
## Problem `D`
Consider the following RISC-V assembly code.
```=
.text
mv s1, a0
addi s2, s2, 4
Start: beq s1, x0, End
lw a0, 0(s1)
jal ra, printf
add s1, s2, s1
lw s1, 0(s1)
jal x0, Start
End: jalr x0, ra, 0
```
Recall that immediate values are generated from instructions with the following table:

We will refer to the number produced after this process is completed as the "immediate value."
What are the fields for the machine code generated for `beq s1, x0, End` (line 4)?
Immediate value
> * D01 = ?
==24==
:::info
Each line is 4 byte, so immediate values is $6(lines)*4=24$
:::
funct3
> * D02 = ?
==0x0==
:::info
According to the [risc-v mannual](https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf)

Hence, answer is 000.
:::
opcode
> * D03 = ?
==0x63==
rs1
> * D04 = ?
==9==
rs2
> * D05 = ?
==0==
:::info
x0 is always 0.
:::
---
## Problem `E`
Consider the following pipelined circuit. Assume all registers have their clock inputs correctly connected to a global clock signal and that logic gates have the following parameters:
* XOR gate delay: 80 ps
* AND gate delay: 60 ps
* OR gate delay: 40 ps

When shopping for registers, we find two different models and want to determine which would be best for our circuit.
Register Type $\lambda$
* Setup Time: 40 ps
* Hold Time: 20 ps
* Clock-to-Q Delay: 30 ps
Register Type $\tau$
* Setup Time: 10 ps
* Hold Time: 10 ps
* Clock-to-Q Delay: 80 ps
1. What is the minimum latency for the circuit from A to B if we use register type $\lambda$? (Please include ps in your answer.)
> * E01 = ?
==420 ps==
> 2 * (30ps + 80ps + 60ps + 40ps) = 420ps
> Critical Path = CLK_Q + XOR + AND + SETUP
> Because this passes through 2 registers, our latency is 2 clock cycles.
> just 1 critical path because it considers the latency to be just the top path A takes to B.
:::info
Tlatency = Tsetup + Tprop + Tclk_q
Since the circuit is seperate in to two parts, so it must pass two clocks from A to B.
:::
2. What is the minimum latency for the circuit from A to B if we use register type $\tau$? (Please include ps in your answer.)
> * E02 = ?
==460 ps==
> 2 * (80ps + 80ps + 60ps + 10ps) = 460ps
> It also counts an extra clock to q to give A
its value or propagate through the last register to B.
---
## Problem `F`
Consider the following RISC-V code:
```c
Loop: andi t2, t1, 1
srli t3, t1, 1
bltu t1, a0, Loop
jalr s0, s1, MAX_POS_IMM
...
```
1. What is the value of the byte offset that would be stored in the immediate field of the `bltu` instruction?
> * F01 = ?
==-8==
> Two instructions away = `-8` bytes
:::info
Each instruction is $4$ byte, so to go back to Loop, it need to minus $4*2$ bytes.
:::
2. We would like to propose a revision to the standard 32-bit RISC-V instruction formats where each instruction has a unique opcode (which still is `7` bits). This justifies taking out the `funct3` field from the R, I, S, and SB instructions, allowing you to allocate bits to other instruction fields except the opcode field. Assume register `s0 = 0x1000 0000`, `s1 = 0x4000 0000`, `PC = 0xA000 0000`. Let's analyze the instruction: `jalr s0, s1, MAX_POS_IMM` where `MAX_POS_IMM` is the maximum possible positive immediate for `jalr`. After the instruction executes, what are the values in the following registers? (Answer in HEX)
* `s0` = F02
* `s1` = F03
* `PC` = F04
> * F02 = ?
==0xA000 0004==
> * F03 = ?
==0x4000 0000==
> * F04 = ?
==0x4000 0FFF==
> We know that rd and rs1 fields are now 6 bits. `jalr` is an I-type instruction, so we take out the `funct3` bits but we give each of rd and rs1 fields 1 bit, meaning we have 1 bit leftover to give to the immediate field. Thus, we now have a 13-bit immediate. Thus, the maximum possible immediate a `jalr` instruction can hold is $+2^{12} - 1$ halfwords away, which is represented as 0b0 1111 1111 1111, which is `0x0FFF`.
>
> `s0` is the linking register -- itss value is PC + 4
> `s1` does not get written into so it stays the same.
> `PC = R[s1] + 0x0FFF`
---
## Problem `G`
Consider the following circuit:

Assume input A and input B come from registers. Assume all 2-input logical gates have a 10 ns propagation delay. The `NOT` gate has a 5 ns delay. All registers have a clk-to-q of 15 ns and setup time of 20 ns.
1. Find the minimum clock period to ensure the validity of the circuit. (Please include ns in your answer)
> * G01 = ?
==75 ns==
> We have the following paths:
> * Input A (clock-to-q) -> NOT -> Register (setup) = 15 ns + 5 ns + 20 ns = 40 ns
> * Input A (clock-to-q) -> NOT -> AND -> NOR -> AND -> Register (setup) = 15 ns + 5 ns + 10 ns + 10 ns + 10 ns + 20 ns = 70 ns
> * Input B (clock-to-q) -> NAND -> AND -> NOR -> AND -> Register (setup) = 15 ns + 10 ns + 10 ns + 10 ns + 10 ns + 20 ns = 75 ns
> * Register (clock-to-q) -> NOT -> NOR -> AND ->Register (setup) = 15 ns + 5 ns + 10 ns + 10 ns + 20 ns = 60 ns
>
> So we need the max of them which would be 75 ns.
:::info
Find the longest delay.
:::
2. Find the maximum hold time such that there are no hold time violations. (Please include ns in your answer)
> * G02 = ?
==20 ns==
> For the maximum hold time, we need to look at the same paths to see what would be the shortest path to get to the register:
> * Input A (clock-to-q) -> NOT -> Register (NO setup) = 15 ns + 5 ns = 20 ns
> * Input A (clock-to-q) -> NOT -> AND -> NOR -> AND -> Register (NO setup) = 15 ns + 5 ns + 10 ns + 10 ns + 10 ns = 50 ns
> * Input B (clock-to-q) -> NAND -> AND -> NOR -> AND -> Register (NO setup) = 15 ns + 10 ns + 10 ns + 10 ns + 10 ns = 55 ns
> * Register (clock-to-q) -> NOT -> NOR -> AND -> Register (NO setup) = 15 ns + 5 ns + 10 ns + 10 ns = 40 ns
>
> For this one, when we get to the register, we do NOT want to include the setup time as we want to see what is the shortest time to get to a register. This means we take a min of the above paths (which does NOT include the setup) which would be 20 ns.
:::info
Find the minimum delay.
:::
---
## Problem `H`
We wish to implement a function, `reverse`, that will take in a pointer to a string, its length, and reverse it. Assume that the argument registers, `a0` and `a1`, hold the pointer to and length of the string, respectively. Complete the following code skeleton to implement this function.
```cpp
reverse:
# This part saves all the required registers you will use.
mv s0, a0 # memory address
mv s1, a1 # strlen
addi t0, x0, 0 # iteration
Loop:
# retrieve left and right letters
add t1, s0, t0 # t1 is moving pointer from left (base + offset/iteration)
lb t2, 0(t1) # t2 contains char from left
sub t3, s1, t0 # imm needs to be s1 - t0
H01 # since strlen indexes out of string
add t4, s0, t3 # t4 is moving pointer from right (base + strlen - offset/iteration - 1)
lb t5, 0(t4) # t5 contains char from right
# switch chars
sb t2, 0(t4)
H02
# iterate if necessary
addi t0, t0, 1 # update iter
H03
H04
mv a0, s0 # not necessary
# This part restores all of the registers which were used.
ret
```
:::info
s1 is the string length
t2 is the first character of the string, and t5 is the last character of the string.
t1 is the address of t2 and t4 is the address of t5.
t0 is the iterator of the loop.
t3 is the gap between t1 and t4.
```
first interation.
s1
<---------->
t2 t5
- - - - - -
^ ^
| |
t1 t4
t0<------>
t3
```
```
second interation.
s1
<---------->
t2 t5
- - - - - -
^ ^
| |
t1 t4
t0<--->
t3
```
```
Third interation.
s1
<---------->
t2t5
- - - - - -
^ ^
| |
t1t4
t0
```
The iteration stop while t0 reach the middle of the string. In this case it would end with the third iteration.
:::
> * H01 = ?
==addi t3, t3, -1==
:::info
t1+t4+1 = s1
t4 = s1-t1-1 = s1-t0-1 = s1-(t3)
> we assume t1 as the index number of the string.
:::
> * H02 = ?
==sb t5, 0(t1)==
:::info
Switch the $t2$ and $t5$ value.
:::
> * H03 = ?
==srli s8, s1, 1==
:::info
Since reverse only need to iterate the half of the string time, t0 just need to iterate to $\frac{strlen}{2}$
:::
> * H04 = ?
==bne t0, s8, Loop==
:::info
Check wheter the iterate value $t0$ and $\frac{strlen}{2}$ are same, if it is the same ;then the loop is ended otherwise it go back to label Loop.
:::
---
## Problem `J`
Take a look at the following circuit:

We have a register clk-to-Q time of 5ps, a hold time of 2ps, and a setup time of 3ps. AND and NAND gates have a delay of 5ps, OR and XOR gates have a delay of 6ps, and
NOT gates have a delay of 1ps. Assume that our inputs A, B, C, and D arrive on the rising edge of the clock.
1. Which gates make up the critical path in the circuit above? Your answer should be correctly ordered from left to right, e.g. NOT $\to$ OR $\to$ NAND.
> * J01 = ?
==XOR $\to$ AND $\to$ OR $\to$ OR==
2. What is the critical path delay in the circuit?
> * J02 = ?
==31 ps==
3. Let us now consider only the portion of the circuit between `Reg2` and `Reg3`. Assume that the clock period (rising edge to rising edge) is 100 ps, registers have a clk-to-Q delay of 25ps and a setup and hold time of 20ps, and all gates have a delay of 5ps. Choose the waveform with the correct outputs for `Reg2` and `Reg3`.
- [ ] Option A

- [ ] Option B

- [ ] Option C

- [ ] Option D

Notation: For reference, in the diagram below, the first region indicates an "undefined" signal, the second region indicates a signal of "high" or 1, and the third region indicates a signal of "low" or 0.

> J03 = ?
==B==
---
## Problem `K`
Consider the following program that computes the [Fibonacci sequence](https://en.wikipedia.org/wiki/Fibonacci_number) recursively. The C code is shown on the left, and its translation to RISC-V assembly is provided on the right. You are told that the execution has been halted just prior to executing the ret instruction. The SP label on the stack frame (part 3) shows where the stack pointer is pointing to when execution halted.
- [ ] C code
```c
int fib(int n)
{
if (n <= 1) return n;
return fib(n - 1) + fib(n - 2);
}
```
- [ ] RISC-V Assembly (incomplete)
```c
fib: addi sp, sp, -12
sw ra, 0(sp)
sw a0, 4(sp)
sw s0, 8(sp)
li s0, 0
li a7, 1
if: ble __K01__
sum: addi a0, a0, -1
call fib
add s0, s0, a0
lw a0, 4(sp)
addi a0, a0, -2
call fib
mv t0, a0
add a0, s0, t0
done: lw ra, 0(sp)
lw s0, 8(sp)
L1: addi sp, sp, 12
ret
```
1. Complete the missing portion of the `ble` instruction to make the assembly implementation match the C code.
>* K01 = ?
==a0, a7, done==
:::info
a0 is n in C code.
a7 is `1` in C code.
done is ready to return in C code.
:::
2. How many distinct words will be allocated and pushed into the stack each time the function `fib` is called?
>* K02 = ?
==3==
:::info
$12/4=3$
:::
3. Please fill in the values for the blank locations in the stack trace below. Please express the values in HEX.
| Notation | address |
| :------: | ------- |
| Smaller address | 0x280 |
| | 0x1 |
| | K03 |
| SP $\to$ | K04 |
| | K05 |
| | 0x0 |
| | 0x280 |
| | 0x3 |
| | 0x0 |
| | 0x2108 |
| | 0x4 |
| | 0x6 |
| Larger address | 0x1 |
> * K03 = ?
==0x0==
> * K04 = ?
==0x280==
> * K05 = ?
==0x2==
:::info

Initial `n` is 0x3, and the after the inst. `sum: addi a0, a0, -1` the k05 becomes 0x3-1=0x2.
K04 = ra = 0x280
s0 is to store the temporary value to add fib(n-1)+fib(n-2) together equal to inst.`add a0, s0, t0`
:::
4. What is the hex address of the `done` label? (Answer in HEX)
> * K06 = ?
==0x298==
:::info
There have 6 inst. between the first `call fib` and done label.Therefore the address is 0x280+6*4=0x298
:::
5. What was the address of the original function call to `fib`? (Answer in HEX)
> * K07 = ?
==0x2104==
:::info
Since call would store the next instruction's address so the first ra minus 4 is the answer, which is 0x2108.
:::
---
## Problem `L`
Suppose we want to create a system that decides if the concatenation of its previous 2 single-bit inputs is a power of 2 (where the MSB is the input from 2 cycles ago and the LSB is from 1 cycle ago). If the previous 2 bits (prior to the current input) are a power-of-two the system outputs a 1, otherwise it outputs 0. Before any input is sent, assume the initial previous 2 bits are 2'b00.
A partial finite state machine diagram of this circuit is shown below:

> Before receiving any inputs the FSM is in state A.
1. For this FSM to provide the correct answer, to what existing states must D transition to (A, B, C, or D), and what output does D give (0 or 1)?
* Current State = D, Input = 0, Next State = __ L01 __
* Current State = D, Input = 1, Next State = __ L02 __
* Current State = D, Output = __ L03 __
:::info
==A== previous two bits are 00, and ==B== previos two bits are 01, and ==C== previos tow bits are 11, and ==D== two bits are 10.
Hence, the accepting state is ==B== and ==D==. Both two state would output `1` and other state would output`0`.
:::
> * L01 = ?
==A==
:::info
Considering the route A->B->D, the previous two bits are 10, when the input is 0 then the previos two bits would become 00 which means the next state would go to ==A==.
:::
> * L02 = ?
==B==
:::info
Considering the route A->B->D, the previous two bits are 10, when the input is 1 then the previos two bits would become 01 which means the next state would go to ==B==.
:::
> * L03 = ?
==1==
## Reference
[2021 Term Project](https://hackmd.io/-HXfnGzdT6iNIA5UzV4yLw)
[2021 Quiz3 problem](https://hackmd.io/@sysprog/arch2021-quiz3)
[2021 Quiz3 solution](https://hackmd.io/@sysprog/arch2021-quiz3-sol)
[build in bit function](http://sunmoon-template.blogspot.com/2017/04/gcc-built-in-functions-for-binary-gcc.html)
[UC combinational logic](https://www.youtube.com/playlist?list=PLDoI-XvXO0ap04XFOxcFT2_Bflnz_RHWU)
[UC sequential logic](https://www.youtube.com/playlist?list=PLDoI-XvXO0arH8SCGZilk8Wn_2B45SOpW)