# CA Fin Past (6/6) <style> .markdown-body li + li { padding-top: 0 !important; } .markdown-body table th, .markdown-body table td { padding: 2px 10px; } </style> --- [TOC] --- ## 考古 2021 Fin #### 1 > load interlock - ==(B)== #### 2 > CPI - ALU: 5, 0.4, 2.0 - BR: 6, 0.3, 1.8 - MEM: 6, 0.3, 1.8 - -> CPI: 5.6 - ==(B): 5.6== #### 3 > single-cycle, multi-cycle - ==(B), (C\)== #### 4 > write invalidate protocol - ==(A), (D), (E)== #### 5 > cache, miss rate, miss penalty - ==(C\), (E)== #### 6 > cache, write strategy, WAW, RAW, register renaming, distributed shared-memory scheme - ==(B), (D), (E)== #### 7 > direct-mapped cache, tag - 2^byte_offset = block_size => byte_offset = 2 - 2^index * block_size = 4K bytes => index = 10 - address = tag + index + byte_offset => tag = 20 - ==(D)== #### 8 ==(SAME)== #### 9 ==(SIMILAR)== > cache, write strategy, WAW, RAW, register renaming, distributed shared-memory scheme - ==(D), (E)== #### 10 > dynamic scheduling, instruction cache, data cache, harvard architecture, growth of uniprocessor - ==(D)== ## 考古 2020 Fin #### 1 > virtual memory, fully associative - (a) - Fully Associative - Due to the exorbitant miss penalty - Reduce miss rate - (b) - Write-back - Due to the exorbitant miss penalty - Reduce miss penalty #### 2 > translation look-aside buffer, tlb, non-blocking cache - (a) - A small fully-associative cache of mappings from virtual to physical addresses - Reduce address translation time - TLB also contains protection bits for virtual address - Fast common case: Virtual address is in TLB, process has permission to read/write it. - (b) - Reduce miss penalty (stalls on misses) - allow data cache to continue to supply cache hits during a miss - out-of-order execution - requires multi-bank memories - hit under miss: - reduces the effective miss penalty by working during miss - hit under multiple miss (miss under miss): - may further lower the effective miss penalty by overlapping multiple misses - complexity UP (multiple outstanding memory accesses) - Requires muliple memory banks - Better for FP programs than for Integer programs - One potential advantage of hit-under-miss is that it won’t increase hit time - (c\) - No, non-blocking cache just allow data cache to continue to supply cache hits during a miss, making miss handling non-blocking, which reduces miss penalty, not miss rate. #### 3 > raid 4, raid 6 - (a) - RAIS 4 is one of standard levels of RAID (Redundant Arrays of Inexpensive Disks), which consists of block-level striping with dedicated parity. - The main advantage of RAID 4 over RAID 2 and 3 is I/O parallelism: in RAID 2 and 3, a single read I/O operation requires reading the whole group of data drives, while in RAID 4 one I/O read operation does not have to spread across all data drives. Moreover, because the parity disk has the old parity, it is possible to compute and update the new parity by just comparing the old data to the new data. As a result, more I/O operations can be executed in parallel, improving the performance of small writes. - (b) - RAID 6 is one of standard levels of RAID (Redundant Arrays of Inexpensive Disks), which consists of block-level striping with double distributed parity. - By applying Network Appliance's row-diagonal parity or RAID-DP, which consists of row parity and diagonal parity, double parity provides fault tolerance up to two failed drives. #### 4 > cache, set associative - cache size = assoc * 2^index * 2^byte_offset => 2^19 = 4 * 2^index * 2^7 => index = 10 - address = tag + index + byte_offset => 32 = tag + 10 + 7 => tag = 15 #### 5 ==(SAME)== #### 6 > multiprocessor - ft / 10 + (1-f)t = t / 80 => f ~= 0.89 - fraction of sequential = 1 - f ~= 0.11 - ==(B)== #### 7 ==(SAME)== #### 8 ==(SAME)== ## 考古 2018 Fin #### 1 > amdahl's law, speed up - Alternative A - $Frac_{FPSQR} = \frac{0.05 \cdot 10}{0.2 \cdot 5 + 0.8 \cdot 1.25} = 0.25$ - $SpeedUp_A = \frac{ExTime_o}{ExTime_n} = ((1-Frac_e) + \frac{Frac_e}{SpeedUp_e})^{-1} = (0.75 + \frac{0.25}{10})^{-1} \approx 1.29$ - Alternative B - $Frac_{FP} = \frac{0.2 \cdot 5}{0.2 \cdot 5 + 0.8 \cdot 1.25} = 0.5$ - $SpeedUp_B = \frac{ExTime_o}{ExTime_n} = ((1-Frac_e) + \frac{Frac_e}{SpeedUp_e})^{-1} = (0.5 + \frac{0.5}{5})^{-1} \approx 1.67$ #### 2 ==(SAME)== #### 3 ==(SIMILAR)== > write invalidate protocol | Time | CPU request | Bus activity | P1 | P2 | |:----:|:-----------:|:----------------:|:---------------------------------------------:|:-----------------------------------------------------------:| | T1 | CPU Write | Invalidate for X | Invalidate for this block, Shared -> Invalid | Place invalidate on bus, Shared -> Exclusive | | T2 | CPU Write | Write miss for X | Place write miss on bus, Invalid -> Exclusive | Write-back block; abort memory access, Exclusive -> Invalid | | T3 | CPU Read | Read miss for X | Write-back block; abort memory access, Invalid -> Shared | Place read miss on bus, Exclusive -> Shared | #### 4 ==(SAME)== #### 5 ==(SAME)== #### 6 ==(SIMILAR)== > cache, set associative - cache size = assoc * 2^index * 2^byte_offset => 2^19 = 2 * 2^index * 2^7 => index = 11 - address = tag + index + byte_offset => 32 = tag + 11 + 7 => tag = 14 ## 考古 2016 Fin #### 1 ==(SAME)== #### 2 ==(SIMILAR)== > write invalidate protocol | Time | CPU request | Bus activity | P1 | P2 | |:----:|:-----------:|:----------------:|:---------------------------------------------:|:-----------------------------------------------------------:| | T1 (P1 write X) | CPU Write | Invalidate for X | Place invalidate on bus, Shared -> Exclusive | Invalidate for this block, Shared -> Invalid | | T2 (P2 read X) | CPU Read | Read miss for X | Write-back block; abort memory access, Exclusive -> Shared | Place read miss on bus, Invalid -> Shared | | T3 (P2 write X) | CPU Write | Invalidate for X | Invalidated for this block, Shared -> Invalid | Place invalidate on bus, Shared -> Exclusive | #### 3 > raid 1, raid 3, raid 5, raid 6 - RAID 1 - RAID 1 is one of standard levels of RAID (Redundant Arrays of Inexpensive Disks), which consists of data mirroring, without parity or striping. - Data is written identically to two or more drives, thereby producing a "mirrored set" of drives. - RAID 3 - RAID 3 is one of standard levels of RAID (Redundant Arrays of Inexpensive Disks), which consists of byte-level striping with dedicated parity. - All disk spindle rotation is synchronized and data is striped such that each sequential byte is on a different drive. Parity is calculated across corresponding bytes and stored on a dedicated parity drive. - RAID 5 - RAID 5 is one of standard levels of RAID (Redundant Arrays of Inexpensive Disks), which block-level striping with distributed parity. - Unlike RAID 4, parity information is distributed among the drives, requiring all drives but one parity disk to be present to operate. Therefore, there is no specific disk, such as the parity disk, having significantly heavier loads than others due to accessing parity. - RAID 6 - RAID 6 is one of standard levels of RAID (Redundant Arrays of Inexpensive Disks), which consists of block-level striping with double distributed parity. - By applying Network Appliance's row-diagonal parity or RAID-DP, which consists of row parity and diagonal parity, double parity provides fault tolerance up to two failed drives. #### 4 > virtual memory - Translation: - Program can be given consistent view of memory, even though physical memory is scrambled - Makes multithreading reasonable (now used a lot!) - Only the most important part of program (“Working Set”) must be in physical memory. - Contiguous structures (like stacks) use only as much physical memory as necessary yet still grow later. - Protection (Base <= Adderss <= Bound): - Different threads (or processes) protected from each other. - Different pages can be given special behavior - Read Only, Invisible to user programs, etc - Kernel data protected from User programs - Very important for protection from malicious programs - Sharing: - Can map same physical page to multiple users (Shared memory) #### 5 ==(SIMILAR)== > multiprocessor - ft / 50 + (1-f)t = t / 30 => f ~= 0.99 - fraction of sequential = 1 - f ~= 0.01