# Quiz6 of Computer Architecture (2020 Fall) > Solutions ## Question `A` Someone has designed 4 separate concurrent processes each of which prints a single character "A", "T", "C" or "G". His/Her customers place orders for sequences that satisfy certain constraints and he/she adds semaphores as appropriate to ensure the printed sequence meets the specified criteria. For each of customer orders below, add the appropriate semaphores so the running processes will produce sequences that make the customers happy – do not forget to specify the semaphores' initial values! To receive full credit, do not impose any unnecessary constraints. Assume the processes start running immediately and that there are no constraints on the order in which statements in different processes are executed except those imposed by your semaphores. Processes will run indefinitely although they may, of course, end up stuck in a `WAIT()`. (1) Request: "the sequence CAT" ```cpp semA = __ A01 __ semT = __ A02 __ semC = __ A03 __ semG = __ A04 __ Process #1 Process #2 Process #3 Process #4 ---------- ---------- ---------- ---------- A: C: G: T: A05 A07 A09 A10 puts("A") puts("C") puts("G") puts("T") A06 A08 goto A goto C goto G goto T ``` `A01` to `A04` are the values such as `0` and `1`. `A05`-`A10` may be the operation of `wait` and `signal` such as `wait(semT)`. A01 : ==0==, A02 : ==0==, A03 : ==1==, A04 : ==0== A05 : ==wait(semA)==, A06 : ==signal(semT)== A07 : ==wait(semC)==, A08 : ==signal(semA)== A09 : ==wait(semG)== A10 : ==wait(semT)== (2) Request "sequences have to be exactly 4 characters long" ``` semC = __ A11 __ Process #1 Process #2 Process #3 Process #4 ---------- ---------- ---------- ---------- A: C: G: T: A12 A13 A14 A15 puts("A") puts("C") puts("G") puts("T") goto A goto C goto G goto T ``` `A11` is the value such as `0`, `1`, `2` and etc. `A12`-`A15` may be the operation of `wait` and `signal` such as `wait(semT)`. A11 : ==4== A12 : ==wait(semC)== A13 : ==wait(semC)== A14 : ==wait(semC)== A15 : ==wait(semC)== --- ## Question `B` We assume the existence of three asynchronous processes: * a Producer process, producing a stream of characters; * a Consumer process, which consumes a stream of characters; and * a Filter process spliced between the Consumer and Producer processes. The Filter process takes characters from the producer, processes them (via a translate function), and passes the result to the Consumer process. The Producer and Consumer processes each communicate directly only with the Filter process. The following is in Shared Memory (shared among Producer, Filter, and Consumer processes): ```cpp Semaphore charsA = ??, spaceA = ??, charsB = ??, spaceB = ??; char buf[100]; char in_data; int in = 0, out = 0; ``` and the following code runs in the Filter Process: ```cpp while (1) { char tmp; wait(charsA); tmp = in_data; signal(spaceA); tmp = translate(tmp); /* do the actual translation */ wait(spaceB); buf[in] = tmp; in = (in + 1) % 100; /* increment β€˜in’ modulo 100 */ signal(charsB); } ``` (1) What is the maximum number of characters that can be produced by the Producer process but not yet processed by the Filter process? * Maximum unprocessed characters produced: __ B01 __ > B01 : ==1== (2) What are appropriate initial values for each of the semaphores? * initial value for charsA: __ B02 __ * initial value for spaceA: __ B03 __ * initial value for charsB: __ B04 __ * initial value for spaceB: __ B05 __ > B02 : ==0==, B03 : ==1==, B04 : ==0==, B05: ==100== --- ## Question `C` For the following questions, assume a processor with 64-bit virtual addresses, 40-bit physical addresses and page size of 4096 ($2^{12}$) bytes per page. The Page Table of this processor uses an LRU replacement strategy, and handles missing pages using a page fault handler. (1) What is the size of the page table? Assume that each page table entry includes a dirty bit and a resident bit. Specify the number of page table entries and the width of each entry. * Number of entries in the page table: __ C01 __ * Width of each page table entry (bits): __ C02 __ > C01: ==$2^{52}$== > C02 : ==30== (2) What is the maximum fraction of virtual memory that can be resident in physical memory at any given time (assuming the page table is not in physical memory)? * Max fraction of virtual memory that can be resident in physical memory: __ C03 __ > C03 : ==$\frac{1}{2^{24}}$== (3) The following program fragment is executed and a record is made of the inputs and outputs of the Memory Management Unit. The record is shown in the table below. ```c sw x11, 0(x10) lw x11, 4(x13) lui x12, 4 ``` | Access type | Virtual address | Physical address | |-------------|-----------------|------------------| | IF | 0x60FF8 | 0x10FF8 | | DW | 0x04600 | 0x74600 | | IF | 0x60FFC | 0x10FFC | | DR | 0x18410 | 0x169410 | | IF | 0x61000 | 0x09000 | > IF: instruction fetch, DW: data write, DR: data read Using information from the program and the table above, please deduce the contents of as many entries as possible in the page table. Assume the original address sizes of 64-bit virtual addresses, 40-bit physical addresses, and page size of 4096 (2^{12}) bytes per page. Assume that **pages holding instructions are read-only**. Please make an entry in the table below for each page table entry we learn about after the execution of the program fragment. Note that the table below is not an actual page table, it is just a list of entries from the page table that you can infer from this problem. For each entry provide the VPN, the dirty (`D`) and resident (`R`) bits, and the PPN if they are known. If you can not deduce the value of a field, enter a `?` for that field. | VPN | D | R | PPN | |-----|---|---|-----| | 0x60 | C04 | C05 | C06 | | 0x04 | C07 | C08 | C09 | | 0x18 | C10 | C11 | C12 | | 0x61 | C13 | C14 | C15 | > C04 : ==0==, C05 : ==1==, C06 : ==0x61== > C07 : ==1==, C08 : ==1==, C09 : ==0x74== > C10 : ==?==, C11 : ==1==, C12 : ==0x169== > C13 : ==0==, C14 : ==1==, C15 : ==0x9== (4) At some later point in time, suppose that the contents of the page table and its corresponding fully associative TLB are as shown to the below. As previously mentioned, the page table uses an LRU replacement policy, and the LRU page (shown below) will be chosen for replacement if necessary. For each of these four accesses, compute its corresponding physical address and indicate whether the access causes a TLB miss and/or a page fault. Assume that **each access below is independent of the others and starts with the TLB and page table state shown to the below**. | VPN (tag) | V | D | PPN | Note | |-----------|---|---|-----|------| | `0x0` | 1 | 0 | `0x1337` | - | | `0x1` | 0 | 0 | `0x7` | - | | `0x6` | 1 | 1 | `0x33` | $\leftarrow$ LRU | | `0x3` | 1 | 0 | `0x534` | - | > TLB | VPN | R | D | PPN | Note | |-----|---|---|-----|------| | 0 | 1 | 0 | `0x1337` | - | | 1 | 0 | 0 | - | - | | 2 | 1 | 0 | `0x450` | - | | 3 | 1 | 0 | `0x534` | $\leftarrow$ LRU page | | 4 | 0 | 0 | - | - | | 5 | 1 | 1 | `0xAB5` | - | | 6 | 1 | 1 | `0x33` | - | > Page Table Fill in table below (possible answers contains HEX and Yes/No) | - | Virtual address | PPN (in hex) | Physical Address (in hex) | TLB Miss? | Page Fault? | |---|-----------------|--------------|---------------------------|-----------|-------------| | 1 | `0x6004` | C16 | C17 | C18 | C19 | | 2 | `0x6030` | C20 | C21 | C22 | C23 | | 3 | `0x1234` | C24 | C25 | C26 | C27 | | 4 | `0x2008` | C28 | C29 | C30 | C31 | > C16 : `0x33`, C17 : `0x33004`, C18 : ==No==, C19 : ==No== > C20 : `0x33`, C21 : `0x33030`, C22 : ==No==, C23 : ==No== > C24 : `0x534`, C25 : `0x534234`, C26 : ==Yes==, C27 : ==Yes== > C28 : `0x450`, C29 : `0x450008`, C30 : ==Yes==, C31 : ==No== --- ## Question `D` Consider a five-stage pipelined RISC-V processor `P1` (IF, DEC, EXE, MEM, WB), where: * All branches are predicted not-taken. * Branch decisions are made in the EXE stage. * The pipeline has full bypassing. Assume that the loop shown to the beblow has been running for a while and will be re-executed after this iteration. ```cpp loop: add x3, x2, x1 slli x2, x2, 1 xor x4, x3, x4 bnez x2, loop and x11, x2, x1 or x12, x22, x7 sra x13, x23, x8 ... ``` (1) How many cycles does it take to execute one iteration of the loop on this processor `P1`? * Cycles per iteration on this processor: __ D01 __ > D01 : ==6== > :::spoiler > ![](https://i.imgur.com/wtFSQxC.png) > ::: (2) Now consider a variant of the previous five-stage pipelined RISC-V processor `P2` (IF, DEC, EXE, MEM, WB), where: * All branches are predicted not-taken. * Branch decisions are made in the EXE stage. * The pipeline has no bypass paths. Assume that our example loop (same as the previous one) has been running for a while and will be re-executed after this iteration. How many cycles does it take to execute one iteration of the loop on this processor? * Cycles per iteration on P2: __ D02 __ > D02 : ==8== :::spoiler ![](https://i.imgur.com/fjK2B3d.png) ::: --- ## Question `E` Consider the execution of the following code sequence on a 5-stage pipelined RISC-V processor, which is fully bypassed, predicts all branches not-taken, and annuls instructions following taken branches. Assume that branch taken/not-taken decisions are made in the EXE stage. Also, assume that the results of load operations are not available until the WB stage. The loop sums the first `100` elements of the integer array at address 0x1000 and stores the result at address `0x2000`. Assume execution halts at instruction `UNIMPL`. ```cpp lui x11, 1 // x11 = 0x1000 (array) lui x15, 2 // x15 = 0x2000 (result) addi x12, x0, 0 // x12 holds sum addi x14, x11, 400 // address of array element 101 A: lw x13, 0(x11) // load next array element addi x11, x11, 4 // addr of next array element add x12, x12, x13 // add element value to sum bne x11, x14, A // loop until element 101 sw x12, 0(x15) // store result xor x2, x3, x4 UNIMPL ``` (1) Are there points in the execution of the sequence when data is bypassed from the EXE stage back to the DEC stage? If so, give the instruction(s) in the DEC stage at each such point; otherwise, enter `N/A`. * Instruction(s) in DEC, or `N/A`: __ E01 __ > E01 : ==`N/A`== (2) Are there points in the execution of the sequence when data is bypassed from the WB stage back to the DEC stage? If so, give the instruction(s) in the DEC stage at each such point; otherwise, enter `N/A`. * Instruction(s) in DEC, or `N/A`: __ E02 __ > E02 : ==`add, bne`== (3) Are there points during the execution of the sequence when the pipeline is stalled? If so, give the instruction(s) in the DEC stage at each such point; otherwise, enter `N/A`. * Instruction(s) in DEC, or `N/A`: __ E03 __ > E03 : ==`add`== > :::spoiler > ![](https://i.imgur.com/pyHrOHm.png) > the contents of the WB stage > ::: ---