Quiz3 of Computer Architecture (2020 Fall)

Solutions

Question `A`

Simplify the following Boolean expressions by finding a minimal sum-of-products expression for each one. (Note: These expressions can be reduced into a minimal SOP by repeatedly applying the Boolean algebra properties we saw in lecture.)

A01
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
A02
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

A list of boolean algebra laws can be found on Wikipedia. In the following simplifications, we denote the law applied in each step. Application of association laws are skipped for clarity.
1.

$\begin{aligned} \overset{―}{a + b \cdot \overset{―}{c}} \cdot d + c \\ = (\overset{―}{a} \cdot \overset{―}{b \cdot \overset{―}{c}}) \cdot d + c & (De Morgan 1) \\ = (\overset{―}{a} \cdot \overset{―}{b} + \overset{―}{\overset{―}{c}}) \cdot d + c & (De Morgan 2) \\ = (\overset{―}{a} \cdot \overset{―}{b} + c) \cdot d + c & (Double negation) \\ = \overset{―}{a} \cdot \overset{―}{b} \cdot d + c \cdot d + c & (Distributivity of \cdot over +) \\ = \overset{―}{a} \cdot \overset{―}{b} \cdot d + c & (Absorption 2) \end{aligned}$

$\begin{aligned} a \cdot \overset{―}{(b + c)} (c + a) \\ = a \cdot (\overset{―}{b} \cdot \overset{―}{c}) (c + a) & (De Morgan 2) \\ = a \cdot (\overset{―}{b} \cdot \overset{―}{c} \cdot c + \overset{―}{b} \cdot \overset{―}{c} \cdot a) & (Distributivity of \cdot over +) \\ = a \cdot (\overset{―}{b} \cdot 0 + \overset{―}{b} \cdot \overset{―}{c} \cdot a) & (Complementation 1) \\ = a \cdot (0 + \overset{―}{b} \cdot \overset{―}{c} \cdot a) & (Annihilator for \cdot) \\ = a \cdot \overset{―}{b} \cdot \overset{―}{c} \cdot a & (Identity for +) \\ = a \cdot \overset{―}{b} \cdot \overset{―}{c} & (Idempotence of \cdot) \end{aligned}$

Question `B`

Muxes are used often so it is important to optimize them. In this problem you will design several variants of a 1-bit, 2-to-1 mux (shown to the below) using CMOS gates, and will compare their costs in number of transistors.
Note: a CMOS gate consists of an output node connected to a single pFET-based pullup circuit and a single nFET-based pulldown circuit. Gates obtained by combining multiple CMOS gates are not a CMOS gate.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Consider the implementation shown below, which uses two AND gates and an OR gate. Because a single CMOS gate cannot implement AND or OR, each AND gate is implemented with a CMOS NAND gate followed by a CMOS inverter, and the OR gate is implemented with a CMOS NOR gate followed by a CMOS inverter. How many transistors does this implementation have?
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Number of transistors in mux: __ B01 __
- B01 =
  20
  A CMOS logic gates consists of a pull up network and a pull down network. When the target function evaluates to TRUE, the pull up network is "connected" (low resistance) and the pull down network is "cut off" (high resistance). Typically, the pull up network uses PMOSs and the pull down network uses NMOSs.
  Some basic transistor counts:
Gate Transistor count

Inverter 2

NAND 4

NOR 4

AND (NAND + INV) 6

OR (NOR + INV) 6

In this case, there are
$2 + 6 + 6 + 6 = 20$ transistors.
Consider the implementation shown below, which uses three instances of gate F. Find the Boolean expression for F. If F can be built using a single CMOS gate, say "Yes." Otherwise, give a convincing explanation for why F cannot be implemented as a CMOS gate. How many transistors does this implementation have?
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Number of transistors in mux (if F can be built as a CMOS gate): __ B02 __
- B02 = ?
  Yes, 14
  Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
  Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
  By bubble pushing, it can be shown that F can be a single NAND gate, which can be built with 4 transistors. The total number of transistors used is
  $2 + 4 + 4 + 4 = 14$ .
Consider the implementation shown below, which uses gate G. Find the Boolean expression for G. If G can be built using a single CMOS gate, say "Yes." Otherwise, give a convincing explanation for why G cannot be implemented as a CMOS gate. How many transistors does this implementation have?
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Number of transistors in mux (if G can be built as a CMOS gate): __ B03 __
- B03 =
  G cannot be built as a single CMOS gate because it is not inverting: G(1,0,t) = t, so a rising input (G(1,0,0)
  $\to$ G(1,0,1)) causes a rising output
  CMOS logic is inverting - when an input is risen to 1, the pull up network has more open parts and the pull down network has more closed, and therefore the output cannot be higher than the original, and vice versa. Now consider the case
  $(\overset{―}{A}, \overset{―}{B}, \overset{―}{S}) = (1, 0, 0)$ . When
  $\overset{―}{S}$ becomes 1 instead, the output also rises from 0 to 1, which is not realizable with a single CMOS gate.
Consider the implementation shown below, which uses gate H. Find the Boolean expression for H. If H can be built using a single CMOS gate, say "Yes." Otherwise, give a convincing explanation for why H cannot be implemented as a CMOS gate. How many transistors does this implementation have?
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Number of transistors in mux (if H can be built as a CMOS gate): __ B04 __
- B04 =
  Yes, 12
  Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
  Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
  The standard gate of the expression
  $\overset{―}{a s + b t}$ uses 8 transistors in total. With two additional inverters, a total of
  $2 + 2 + 8 = 12$ transistors are used.

Gate	Transistor count
Inverter	2
NAND	4
NOR	4
AND (NAND + INV)	6
OR (NOR + INV)	6

Question `C`

Consider the C procedure below and its translation to RISC-V assembly code, following the C code.

C procedure

int f(int a, int b) {
    int c = b – a;
     if (c & C01 == 0) /* c is a multiple of 4 */
         return 1;
     int d = f(a – 1, b + 2);
     return 3 * (d + a);
}

The translated RISC-V assembly code

f:    sub a2, a1, a0
      andi a2, a2, __C01__
      bnez a2, ELSE
      li a0, 1
      jr ra 
ELSE: addi sp, sp, -8
      sw a0, 0(sp)
      sw ra, 4(sp)
      addi a0, a0, -1
      addi a1, a1, 2
      jal ra, f
A4:   lw a1, 0(sp)
      lw ra, 4(sp)
L1:   add a0, a0, a1
      slli a1, a0, 1
      add a0, a0, a1
      addi sp, sp, 8
      jr ra

What value should the C01 term in the C code and the assembly be replaced with to make the if statement correctly check if the variable c is a multiple of 4?
- C01 =
  3
  AND with 0b11 (or 0x3 or just 3) to check if last 2 bits are 0, indicating a multiple of 4.
  A number is a multiple of 4 iff the last 2 bits are 0.
How many words will be written to the stack before the program makes each recursive call to the function f?
- C02 =
  2
  the return address and the original a. Note that b is not saved because it is not used after the recursive call.
The program’s initial call to function f occurs outside of the function definition via the instruction jal ra, f. The program is interrupted at an execution (not necessarily the first) of function f, just prior to the execution of add a0, a0, a1 at label L1. The below diagram on the right shows the contents of a region of memory. All addresses and data values are shown in hex. The current value in the SP register is 0xEB0 and points to the location shown in the diagram.
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
What were the values of arguments a and b to the initial call to f? Write "UNKNOWN" if the argument does not show up in the stack.
Initial arguments to f: a = __ C03 __ ; b = __ C04 __
- C03 : 4
  In recursive calls, the return address should be 0xA4 (the A4 label), so the first saved registers are (a1, ra) = (0x4, 0x3B8), where a = a0 = 4. b cannot be inferred as it is not written to the stack.
- C04 : UNKNOWN
  a is passed in a0, can find saved a0 from initial call right before external return address. b is never saved, so cannot tell.
What are the values in the following registers right when the execution of f is interrupted? Write "UNKNOWN" if you cannot tell.
Current value (in hex) of a1: __ C05 __
Current value (in hex) of ra: __ C06 __
- C05 : 0x2
- C06 : 0xA4
  These registers were just loaded from the stack at the time of interruption.
What is the hex address of the jal ra, f instruction that made the initial call to f?
Address (in hex) of instruction that made initial call to f: __ C07 __
- C07 : 0x3B4
  Saved ra of initial call is 0x3B8, call occurs 0x4 before that at 0x3B4
What is the hex address of the instruction at label ELSE?
Address of instruction at label ELSE: __ C08 __
- C08 : 0x8C
  ra saved pointing to lw a1, 0(sp) is at 0xA4, ELSE is 0x18 before at 0x8C.

Question `D`

Consider the logic diagram below, which includes XNOR2, OR2, NAND2, AND2, and INV. Using the
$t_{P D}$ (propagation delay) information for the gate components shown in the table below, compute the
$t_{P D}$ for the circuit.

Gate
$t_{P D}$

XNOR2 7.0 ns

OR2 5.5 ns

NAND2 3.0 ns

AND2 5.0 ns

INV 2.0 ns

Longest path.
$t_{P D}$ in ns = D01
- D01: 17.5
  Longest path: a/b
  $\to$ XNOR2
  $\to$ NAND2
  $\to$ OR2
  $\to$ INV
  $\to$ x
  7.0 + 3.0 + 5.5 + 2.0 = 17.5 ns
Find a minimal sum-of-products expression for output X of the circuit described by the truth table shown below.

Minimal sum of products for X = __ D02 __
- D02 =
  
  The problem can be solved with a Karnaugh map, either manually or with a online solver.
  
  $a b + \overset{―}{a} c \overset{―}{d} + \overset{―}{a} \overset{―}{b} \overset{―}{c} d$

Gate	$t_{P D}$
XNOR2	7.0 ns
OR2	5.5 ns
NAND2	3.0 ns
AND2	5.0 ns
INV	2.0 ns
Longest path. $t_{P D}$ in ns = D01

Question `E`

What is the hexadecimal encoding of the RISC-V instruction sw t1, -4(t1)? You can use the table below to help you with the encoding.

[31:25] [24:20] [19:15] [14:12] [11:7] [6:0]

imm[11:5] rs2 rs1 funct3 imm[4:0] opcode
- opcode = 0100011
- funct3 = 010
- t1 = x6
- rs1 = 00110, rs2 = 00110
  sw t1, -4(t1) instruction encoding in HEX: __ E01 __
- E01 = 0xFE632E23
  opcode = 0100011
  funct3 = 010
  imm[11:0] = -4 = 0xFFC = 0b1111_1111_1100
  t1 = x6
  rs1 = 00110 rs2 = 00110
  encoding: 1111111_00110_00110_010_11100_0100011
For the following code snippet, provide the value left in each register after executing the entire code snippet (i.e., when the processor reaches the instruction at the end label), or specify "UNKNOWN" if it is impossible to tell the value of a particular register.
```
. = 0x100
     li x4, 0x6
     addi x5, zero, 0xC00
     slli x4, x4, 8
     or x6, x4, x5
end:
```
- x4 = __ E02 __
- x5 = __ E03 __
- x6 = __ E04 __
- pc = __ E05 __
- E02 = 0x600
- E03 = 0xFFFFFC00
- E04 = 0xFFFFFE00
- E05 = 0x110
  In RISC-V, immediates are always signed extended except in CSR. The 12-bit immediate 0xc00 is extended into the full word 0xfffffc00.
Except for the 5-bit immediates used in CSR instructions (Section 2.8), immediates are always sign-extended, and are generally packed towards the leftmost available bits in the instruction and have been allocated to reduce hardware complexity.

(The RISC-V Instruction Set Manual, Volume I: Base User-Level ISA, p.11)
After execution,
```
x4 = 0x6 << 8 = 0x600
x5 = 0xfffffc00
x6 = x4 | x5 = 0xfffffe00
```
The end label is 4 instructions from the initial pseudo-instruction . = 0x100, and thus has address 0x100 +
$4 \times 4$ = 0x110

[31:25]	[24:20]	[19:15]	[14:12]	[11:7]	[6:0]
imm[11:5]	rs2	rs1	funct3	imm[4:0]	opcode

Question `F`

Consider the following program that computes the Fibonacci sequence recursively. The C code is shown on the below, and its translation to RISC-V assembly is provided as well. You are told that the execution has been halted just prior to executing the ret instruction.

C code

int fib(int n) {
   if (n <= 1) return n;
   return fib(n - 1) + fib(n - 2);
}

The translated RISC-V assembly

fib:  addi sp, sp, -12
      sw ra, 0(sp)
      sw a0, 4(sp)
      sw s0, 8(sp)
      li s0, 0
      li a7, 1
if:   ble __F01__
sum:  addi a0, a0, -1
      call fib
      add s0, s0, a0
      lw a0, 4(sp)
      addi a0, a0, -2
      call fib
      mv t0, a0
      add a0, s0, t0
done: lw ra, 0(sp)
      lw s0, 8(sp)
L1:   addi sp, sp, 12
      ret

Complete the missing portion of the ble instruction to make the assembly implementation match the C code.

F01 = a0, a7, done
At the if label, the argument a0 is compared with loaded constant 1 a7. If n is less than or equal to 1, the function restores the registers in the stack and returns. Therefore __F01__ is a0, a7, done.

How many distinct words will be allocated and pushed into the stack each time the function fib is called? Number of words pushed onto stack per call to fib: __ F02 __

F02 = 3
The three sw instruction at the top of the function stores 3 words in total.

Quiz3 of Computer Architecture (2020 Fall)

Question A

Question B

Question C

Question D

Question E

Question F

Read more

你所不知道的 C 語言：數值系統

建構 User-Mode Linux 的實驗環境

Linux 核心設計: Scheduler(8): Energy Aware Scheduling

你所不知道的 C 語言: linked list 和非連續記憶體