Quiz6 of Computer Architecture (2020 Fall)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

General Information

You are allowed to read lecture materials.
- That is, an open book exam.
You shall not disclose your answer during the quiz.
Each answer has 3 points.
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
09:10 ~ 10:20AM on Dec 22, 2020

Question `A`

Someone has designed 4 separate concurrent processes each of which prints a single character "A", "T", "C" or "G". His/Her customers place orders for sequences that satisfy certain constraints and he/she adds semaphores as appropriate to ensure the printed sequence meets the specified criteria. For each of customer orders below, add the appropriate semaphores so the running processes will produce sequences that make the customers happy – do not forget to specify the semaphores' initial values! To receive full credit, do not impose any unnecessary constraints. Assume the processes start running immediately and that there are no constraints on the order in which statements in different processes are executed except those imposed by your semaphores.

Processes will run indefinitely although they may, of course, end up stuck in a WAIT().

(1) Request: "the sequence CAT"

semA = __ A01 __
semT = __ A02 __
semC = __ A03 __
semG = __ A04 __

Process #1    Process #2    Process #3    Process #4
----------    ----------    ----------    ----------
A:            C:            G:            T:
  A05           A07           A09           A10
  puts("A")     puts("C")     puts("G")     puts("T")
  A06           A08
  goto A        goto C        goto G        goto T

A01 to A04 are the values such as 0 and 1.
A05-A10 may be the operation of wait and signal such as wait(semT).

A01 = ?

A02 = ?

A03 = ?

A04 = ?

A05 = ?

A06 = ?

A07 = ?

A08 = ?

A09 = ?

A10 = ?

(2) Request "sequences have to be exactly 4 characters long"

semC = __ A11 __

Process #1    Process #2    Process #3    Process #4
----------    ----------    ----------    ----------
A:            C:            G:            T:
  A12           A13           A14           A15
  puts("A")     puts("C")     puts("G")     puts("T")
  goto A        goto C        goto G        goto T

A11 is the value such as 0, 1, 2 and etc.
A12-A15 may be the operation of wait and signal such as wait(semT).

A11 = ?

A12 = ?

A13 = ?

A14 = ?

A15 = ?

Question `B`

We assume the existence of three asynchronous processes:

a Producer process, producing a stream of characters;
a Consumer process, which consumes a stream of characters; and
a Filter process spliced between the Consumer and Producer processes.

The Filter process takes characters from the producer, processes them (via a translate function), and passes the result to the Consumer process. The Producer and Consumer processes each communicate directly only with the Filter process.

The following is in Shared Memory (shared among Producer, Filter, and Consumer processes):

Semaphore charsA = ??, spaceA = ??, charsB = ??, spaceB = ??;
char buf[100];
char in_data;
int in = 0, out = 0;

and the following code runs in the Filter Process:

while (1) {
   char tmp;
   wait(charsA);
   tmp = in_data;
   signal(spaceA);
   tmp = translate(tmp); /* do the actual translation */
   wait(spaceB);
   buf[in] = tmp;
   in = (in + 1) % 100; /* increment ‘in’ modulo 100 */
   signal(charsB);
}

(1) What is the maximum number of characters that can be produced by the Producer process but not yet processed by the Filter process?

Maximum unprocessed characters produced: __ B01 __

B01 = ?

(2) What are appropriate initial values for each of the semaphores?

initial value for charsA: __ B02 __
initial value for spaceA: __ B03 __
initial value for charsB: __ B04 __
initial value for spaceB: __ B05 __
- B02 = ?
- B03 = ?
- B04 = ?
- B05 = ?

Question `C`

For the following questions, assume a processor with 64-bit virtual addresses, 40-bit physical addresses and page size of 4096 (

2^{12}

) bytes per page. The Page Table of this processor uses an LRU replacement strategy, and handles missing pages using a page fault handler.

(1) What is the size of the page table? Assume that each page table entry includes a dirty bit and a resident bit. Specify the number of page table entries and the width of each entry.

Number of entries in the page table: __ C01 __
Width of each page table entry (bits): __ C02 __
- C01 = ?
- C02 = ?

(2) What is the maximum fraction of virtual memory that can be resident in physical
memory at any given time (assuming the page table is not in physical memory)?

Max fraction of virtual memory that can be resident in physical memory: __ C03 __
- C03 = ?

(3) The following program fragment is executed and a record is made of the inputs and outputs of the Memory Management Unit. The record is shown in the table below.

  sw x11, 0(x10)
  lw x11, 4(x13)
  lui x12, 4

Access type	Virtual address	Physical address
IF	0x60FF8	0x10FF8
DW	0x04600	0x74600
IF	0x60FFC	0x10FFC
DR	0x18410	0x169410
IF	0x61000	0x09000

IF: instruction fetch, DW: data write, DR: data read

Using information from the program and the table above, please deduce the contents of as many entries as possible in the page table. Assume the original address sizes of 64-bit virtual addresses, 40-bit physical addresses, and page size of 4096 (2^{12}) bytes per page. Assume that pages holding instructions are read-only.

Please make an entry in the table below for each page table entry we learn about after the execution of the program fragment. Note that the table below is not an actual page table, it is just a list of entries from the page table that you can infer from this problem. For each entry provide the VPN, the dirty (D) and resident (R) bits, and the PPN if they are known. If you can not deduce the value of a field, enter a ? for that field.

VPN	D	R	PPN
`0x60`	C04	C05	C06
`0x04`	C07	C08	C09
`0x18`	C10	C11	C12
`0x61`	C13	C14	C15

C04 =?

C05 = ?

C06 = ?

C07 = ?

C08 = ?

C09 = ?

C10 = ?

C11 = ?

C12 = ?

C13 = ?

C14 = ?

C15 = ?

(4) At some later point in time, suppose that the contents of the page table and its corresponding fully associative TLB are as shown to the below. As previously mentioned, the page table uses an LRU replacement policy, and the LRU page (shown below) will be chosen for replacement if necessary. For each of these four accesses, compute its corresponding physical address and indicate whether the access causes a TLB miss and/or a page fault. Assume that each access below is independent of the others and starts with the TLB and page table state shown to the below.

VPN (tag)	V	D	PPN	Note
`0x0`	1	0	`0x1337`	-
`0x1`	0	0	`0x7`	-
`0x6`	1	1	`0x33`	$\leftarrow$ LRU
`0x3`	1	0	`0x534`	-

TLB

VPN	R	D	PPN	Note
0	1	0	`0x1337`	-
1	0	0	-	-
2	1	0	`0x450`	-
3	1	0	`0x534`	$\leftarrow$ LRU page
4	0	0	-	-
5	1	1	`0xAB5`	-
6	1	1	`0x33`	-

Page Table

Fill in table below (possible answers contains HEX and Yes/No)

-	Virtual address	PPN (in hex)	Physical Address (in hex)	TLB Miss?	Page Fault?
1	`0x6004`	C16	C17	C18	C19
2	`0x6030`	C20	C21	C22	C23
3	`0x1234`	C24	C25	C26	C27
4	`0x2008`	C28	C29	C30	C31

C16 = ?

C17 = ?

C18 = ?

C19 = ?

C20 = ?

C21 = ?

C22 = ?

C23 = ?

C24 = ?

C25 = ?

C26 = ?

C27 = ?

C28 = ?

C29 = ?

C30 = ?

C31 = ?

Question `D`

Consider a five-stage pipelined RISC-V processor P1 (IF, DEC, EXE, MEM, WB), where:

All branches are predicted not-taken.
Branch decisions are made in the EXE stage.
The pipeline has full bypassing.

Assume that the loop shown to the beblow has been running for a while and will be re-executed after this iteration.

loop: add  x3, x2, x1
      slli x2, x2, 1
      xor  x4, x3, x4
      bnez x2, loop
      and  x11, x2, x1
      or   x12, x22, x7
      sra  x13, x23, x8
      ...

(1) How many cycles does it take to execute one iteration of the loop on this processor P1?

Cycles per iteration on this processor: __ D01 __
- D01 = ?

(2) Now consider a variant of the previous five-stage pipelined RISC-V processor P2 (IF, DEC, EXE, MEM, WB), where:

All branches are predicted not-taken.
Branch decisions are made in the EXE stage.
The pipeline has no bypass paths.

Assume that our example loop (same as the previous one) has been running for a while and will be re-executed after this iteration. How many cycles does it take to execute one iteration of the loop on this processor P2?

Cycles per iteration on P2: __ D02 __
- D02 = ?

Question `E`

Consider the execution of the following code sequence on a 5-stage pipelined RISC-V processor, which is fully bypassed, predicts all branches not-taken, and annuls instructions following taken branches. Assume that branch taken/not-taken decisions are made in the EXE stage. Also, assume that the results of load operations are not available until the WB stage. The loop sums the first 100 elements of the integer array at address 0x1000 and stores the result at address
0x2000. Assume execution halts at instruction UNIMPL.

   lui x11, 1 // x11 = 0x1000 (array)
   lui x15, 2 // x15 = 0x2000 (result)
   addi x12, x0, 0 // x12 holds sum
   addi x14, x11, 400 // address of array element 101
A: lw x13, 0(x11) // load next array element
   addi x11, x11, 4 // addr of next array element
   add x12, x12, x13 // add element value to sum
   bne x11, x14, A // loop until element 101
   sw x12, 0(x15) // store result
   xor x2, x3, x4
   UNIMPL

(1) Are there points in the execution of the sequence when data is bypassed from the EXE stage back to the DEC stage? If so, give the instruction(s) in the DEC stage at each such point; otherwise, enter N/A.

Instruction(s) in DEC, or N/A: __ E01 __
- E01 = ?

(2) Are there points in the execution of the sequence when data is bypassed from the WB stage back to the DEC stage? If so, give the instruction(s) in the DEC stage at each such point; otherwise, enter N/A.

Instruction(s) in DEC, or N/A: __ E02 __
- E02 = ?

(3) Are there points during the execution of the sequence when the pipeline is stalled? If so, give the instruction(s) in the DEC stage at each such point; otherwise, enter N/A.

Instruction(s) in DEC, or N/A: __ E03 __
- E03 = ?

Quiz6 of Computer Architecture (2020 Fall)

Question A

Question B

Question C

Question D

Question E

Question `A`

Question `B`

Question `C`

Question `D`

Question `E`