Try   HackMD

Quiz6 of Computer Architecture (2020 Fall)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
General Information

  • You are allowed to read lecture materials.
    • That is, an open book exam.
  • You shall not disclose your answer during the quiz.
  • Each answer has 3 points.
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    09:10 ~ 10:20AM on Dec 22, 2020

Question A

Someone has designed 4 separate concurrent processes each of which prints a single character "A", "T", "C" or "G". His/Her customers place orders for sequences that satisfy certain constraints and he/she adds semaphores as appropriate to ensure the printed sequence meets the specified criteria. For each of customer orders below, add the appropriate semaphores so the running processes will produce sequences that make the customers happy – do not forget to specify the semaphores' initial values! To receive full credit, do not impose any unnecessary constraints. Assume the processes start running immediately and that there are no constraints on the order in which statements in different processes are executed except those imposed by your semaphores.

Processes will run indefinitely although they may, of course, end up stuck in a WAIT().

(1) Request: "the sequence CAT"

semA = __ A01 __
semT = __ A02 __
semC = __ A03 __
semG = __ A04 __

Process #1    Process #2    Process #3    Process #4
----------    ----------    ----------    ----------
A:            C:            G:            T:
  A05           A07           A09           A10
  puts("A")     puts("C")     puts("G")     puts("T")
  A06           A08
  goto A        goto C        goto G        goto T

A01 to A04 are the values such as 0 and 1.
A05-A10 may be the operation of wait and signal such as wait(semT).

  • A01 = ?
  • A02 = ?
  • A03 = ?
  • A04 = ?
  • A05 = ?
  • A06 = ?
  • A07 = ?
  • A08 = ?
  • A09 = ?
  • A10 = ?

(2) Request "sequences have to be exactly 4 characters long"

semC = __ A11 __

Process #1    Process #2    Process #3    Process #4
----------    ----------    ----------    ----------
A:            C:            G:            T:
  A12           A13           A14           A15
  puts("A")     puts("C")     puts("G")     puts("T")
  goto A        goto C        goto G        goto T

A11 is the value such as 0, 1, 2 and etc.
A12-A15 may be the operation of wait and signal such as wait(semT).

  • A11 = ?
  • A12 = ?
  • A13 = ?
  • A14 = ?
  • A15 = ?

Question B

We assume the existence of three asynchronous processes:

  • a Producer process, producing a stream of characters;
  • a Consumer process, which consumes a stream of characters; and
  • a Filter process spliced between the Consumer and Producer processes.

The Filter process takes characters from the producer, processes them (via a translate function), and passes the result to the Consumer process. The Producer and Consumer processes each communicate directly only with the Filter process.

The following is in Shared Memory (shared among Producer, Filter, and Consumer processes):

Semaphore charsA = ??, spaceA = ??, charsB = ??, spaceB = ??;
char buf[100];
char in_data;
int in = 0, out = 0;

and the following code runs in the Filter Process:

while (1) {
   char tmp;
   wait(charsA);
   tmp = in_data;
   signal(spaceA);
   tmp = translate(tmp); /* do the actual translation */
   wait(spaceB);
   buf[in] = tmp;
   in = (in + 1) % 100; /* increment ‘in’ modulo 100 */
   signal(charsB);
}

(1) What is the maximum number of characters that can be produced by the Producer process but not yet processed by the Filter process?

  • Maximum unprocessed characters produced: __ B01 __
  • B01 = ?

(2) What are appropriate initial values for each of the semaphores?

  • initial value for charsA: __ B02 __
  • initial value for spaceA: __ B03 __
  • initial value for charsB: __ B04 __
  • initial value for spaceB: __ B05 __
    • B02 = ?
    • B03 = ?
    • B04 = ?
    • B05 = ?

Question C

For the following questions, assume a processor with 64-bit virtual addresses, 40-bit physical addresses and page size of 4096 (

212) bytes per page. The Page Table of this processor uses an LRU replacement strategy, and handles missing pages using a page fault handler.

(1) What is the size of the page table? Assume that each page table entry includes a dirty bit and a resident bit. Specify the number of page table entries and the width of each entry.

  • Number of entries in the page table: __ C01 __
  • Width of each page table entry (bits): __ C02 __
    • C01 = ?
    • C02 = ?

(2) What is the maximum fraction of virtual memory that can be resident in physical
memory at any given time (assuming the page table is not in physical memory)?

  • Max fraction of virtual memory that can be resident in physical memory: __ C03 __
    • C03 = ?

(3) The following program fragment is executed and a record is made of the inputs and outputs of the Memory Management Unit. The record is shown in the table below.

  sw x11, 0(x10)
  lw x11, 4(x13)
  lui x12, 4
Access type Virtual address Physical address
IF 0x60FF8 0x10FF8
DW 0x04600 0x74600
IF 0x60FFC 0x10FFC
DR 0x18410 0x169410
IF 0x61000 0x09000

IF: instruction fetch, DW: data write, DR: data read

Using information from the program and the table above, please deduce the contents of as many entries as possible in the page table. Assume the original address sizes of 64-bit virtual addresses, 40-bit physical addresses, and page size of 4096 (2^{12}) bytes per page. Assume that pages holding instructions are read-only.

Please make an entry in the table below for each page table entry we learn about after the execution of the program fragment. Note that the table below is not an actual page table, it is just a list of entries from the page table that you can infer from this problem. For each entry provide the VPN, the dirty (D) and resident (R) bits, and the PPN if they are known. If you can not deduce the value of a field, enter a ? for that field.

VPN D R PPN
0x60 C04 C05 C06
0x04 C07 C08 C09
0x18 C10 C11 C12
0x61 C13 C14 C15
  • C04 =?
  • C05 = ?
  • C06 = ?
  • C07 = ?
  • C08 = ?
  • C09 = ?
  • C10 = ?
  • C11 = ?
  • C12 = ?
  • C13 = ?
  • C14 = ?
  • C15 = ?

(4) At some later point in time, suppose that the contents of the page table and its corresponding fully associative TLB are as shown to the below. As previously mentioned, the page table uses an LRU replacement policy, and the LRU page (shown below) will be chosen for replacement if necessary. For each of these four accesses, compute its corresponding physical address and indicate whether the access causes a TLB miss and/or a page fault. Assume that each access below is independent of the others and starts with the TLB and page table state shown to the below.

VPN (tag) V D PPN Note
0x0 1 0 0x1337 -
0x1 0 0 0x7 -
0x6 1 1 0x33
LRU
0x3 1 0 0x534 -

TLB

VPN R D PPN Note
0 1 0 0x1337 -
1 0 0 - -
2 1 0 0x450 -
3 1 0 0x534
LRU page
4 0 0 - -
5 1 1 0xAB5 -
6 1 1 0x33 -

Page Table

Fill in table below (possible answers contains HEX and Yes/No)

- Virtual address PPN (in hex) Physical Address (in hex) TLB Miss? Page Fault?
1 0x6004 C16 C17 C18 C19
2 0x6030 C20 C21 C22 C23
3 0x1234 C24 C25 C26 C27
4 0x2008 C28 C29 C30 C31
  • C16 = ?
  • C17 = ?
  • C18 = ?
  • C19 = ?
  • C20 = ?
  • C21 = ?
  • C22 = ?
  • C23 = ?
  • C24 = ?
  • C25 = ?
  • C26 = ?
  • C27 = ?
  • C28 = ?
  • C29 = ?
  • C30 = ?
  • C31 = ?

Question D

Consider a five-stage pipelined RISC-V processor P1 (IF, DEC, EXE, MEM, WB), where:

  • All branches are predicted not-taken.
  • Branch decisions are made in the EXE stage.
  • The pipeline has full bypassing.

Assume that the loop shown to the beblow has been running for a while and will be re-executed after this iteration.

loop: add  x3, x2, x1
      slli x2, x2, 1
      xor  x4, x3, x4
      bnez x2, loop
      and  x11, x2, x1
      or   x12, x22, x7
      sra  x13, x23, x8
      ...

(1) How many cycles does it take to execute one iteration of the loop on this processor P1?

  • Cycles per iteration on this processor: __ D01 __
    • D01 = ?

(2) Now consider a variant of the previous five-stage pipelined RISC-V processor P2 (IF, DEC, EXE, MEM, WB), where:

  • All branches are predicted not-taken.
  • Branch decisions are made in the EXE stage.
  • The pipeline has no bypass paths.

Assume that our example loop (same as the previous one) has been running for a while and will be re-executed after this iteration. How many cycles does it take to execute one iteration of the loop on this processor P2?

  • Cycles per iteration on P2: __ D02 __
    • D02 = ?

Question E

Consider the execution of the following code sequence on a 5-stage pipelined RISC-V processor, which is fully bypassed, predicts all branches not-taken, and annuls instructions following taken branches. Assume that branch taken/not-taken decisions are made in the EXE stage. Also, assume that the results of load operations are not available until the WB stage. The loop sums the first 100 elements of the integer array at address 0x1000 and stores the result at address
0x2000. Assume execution halts at instruction UNIMPL.

   lui x11, 1 // x11 = 0x1000 (array)
   lui x15, 2 // x15 = 0x2000 (result)
   addi x12, x0, 0 // x12 holds sum
   addi x14, x11, 400 // address of array element 101
A: lw x13, 0(x11) // load next array element
   addi x11, x11, 4 // addr of next array element
   add x12, x12, x13 // add element value to sum
   bne x11, x14, A // loop until element 101
   sw x12, 0(x15) // store result
   xor x2, x3, x4
   UNIMPL

(1) Are there points in the execution of the sequence when data is bypassed from the EXE stage back to the DEC stage? If so, give the instruction(s) in the DEC stage at each such point; otherwise, enter N/A.

  • Instruction(s) in DEC, or N/A: __ E01 __
    • E01 = ?

(2) Are there points in the execution of the sequence when data is bypassed from the WB stage back to the DEC stage? If so, give the instruction(s) in the DEC stage at each such point; otherwise, enter N/A.

  • Instruction(s) in DEC, or N/A: __ E02 __
    • E02 = ?

(3) Are there points during the execution of the sequence when the pipeline is stalled? If so, give the instruction(s) in the DEC stage at each such point; otherwise, enter N/A.

  • Instruction(s) in DEC, or N/A: __ E03 __
    • E03 = ?