--- tags: 計算機組織, 資工系必修 --- # Exam 3 109暑 計算機組織 蔡文錦 #### Q1. The Average Memory Access Time (AMAT) has three components: hit time, miss rate, and miss penalty. To improve the three components, which of following statements are correct? a. Using a physically-addressed cache can improve hit time (compared to virtually-addressed) b. Using direct-mapped cache can improve hit time (compared to set-associative) c. Using write-through mechanism can improve miss penalty (compared to write-back) d. Increase block size can improve the compulsory miss rate. **Ans: b. d.** #### Q2. The IEEE754 single-precision representation is defined below. Given an IEEE 754 single-precision representation as 0100 0000 1100 0000 0000 0000 0000 00002, what's its value? ![](https://i.imgur.com/J1ctAQD.png) a. $110_2$ b. 6 c. $1.1_2 \times 2^{129}$ d. $1.1_2 \times 2^{2}$ **Ans: a. b. d.** #### Q3. Consider a two-way set associative cache has a total data size of 32 words with four-word (16-byte) blocks and 20-bit byte address. We assume that the caches are initially empty and LRU replacement is used. Given a series of byte-address references as 68, 176, 76, 56, 116, 202, 180, 192, 128, 72 which of following statements are correct? a. The size of the tag field is 14 bits. b. The access to the address 116 in the reference list is a conflict miss. c. The cache miss rate is 4/5. d. There are total 8 compulsory misses. e. The access to the address 180 in the reference list is a compulsory miss. f. The total size of the cache (including valid bit) is (15 + 2x4x32)x4 bits. g. The access to the address 192 in the reference list is a hit. **Ans: a. c. g.** #### Q4. Given a processor with two caches: an I-cache and a D-cache, and the miss penalty is 200 clock cycles. Assume the I-cache miss rate is 2%, D-cache miss rate is 4%, and the percentage of instructions lw and sw is 50%. If the CPI for this workload without any memory stall was measured to be 2.0. Which of following statements are true when memory stall is considered? a. If we double the clock rate (but the memory access time to handle a cache miss does not change), the CPI will become 18. b. If a L2 cache is added only for I-cache and its miss penalty is 20 clock cycles. Assume global instruction miss rate to MM became 0.5% after L2 cache is added. The CPI will become 5.4. c. The CPI will become 10 if memory stall is considered. d. If we double the clock rate (but the memory access time to handle a cache miss does not change), the machine will have a speedup of 9/10. **Ans: a. c.** #### Q5. Compared to conventional architectures, the benefits of using a vector architecture include. a. Vector architecture is a MIMD structure. b. It may reduce data hazard detecting. c. It may reduce control hazard. d. It may reduce memory access time because it can reduce cache miss rate. e. It may reduce the number of instruction fetching and decoding. **Ans: b. c. e.** #### Q6. IEEE754 single-precision format is defined below. Given the number $110.1012x2^{-131}$, please convert it to IEEE 754 single-precision format and write in hexdecimal format. ![](https://i.imgur.com/UDQl9II.png) a. The exponent part is 00000000 b. The fraction part is 110 1010 0000 0000 0000 0000 c. The fraction part is 001 1010 1000 0000 0000 0000 d. The exponent part is 00110101 **Ans: a. c.** #### Q7. Consider the following computer with a CPU, data cache, and memory. - The system has a 12-bit virtual address size, and the size of a virtual memory page is 256 (=$2^8$) bytes. The computer has 65536 (=$2^{16}$) bytes of physical memory. The system uses a page table below (VPN:Virtual Page Number;PPN:Physical Page Number). - The cache has a total data size of 256 bytes and a block size of 16 bytes. It is a direct-mapped cache and is physically indexed. Assume the cache is initially empty. Given a series of virtual address references (byte address) as 0xce2, 0x258, 0xc5a, 0x254, 0x2e8, 0xc52, which of following statements are correct for the data cache ? | VPN | Valid | PPN | VPN | Valid | PPN | |:---:|:-----:|:----:|:---:|:-----:|:----:| | 0 | 0 | --- | 8 | 0 | --- | | 1 | 1 | 0xc4 | 9 | 1 | 0xfe | | 2 | 1 | 0xab | 10 | 1 | 0xfc | | 3 | 0 | --- | 11 | 0 | --- | | 4 | 1 | 0x55 | 12 | 1 | 0x7a | | 5 | 1 | 0x32 | 13 | 1 | 0x01 | | 6 | 1 | 0x54 | 14 | 0 | --- | | 7 | 0 | --- | 15 | 0 | --- | a. There are 4 cache hit. b. There is no capacity miss. c. The physical address corresponding to the virtual address 0xce2 is 0x7ae2. d. The cache miss rate is 2/3. e. There are 2 compulsory misses for the cache. f. There are 2 conflict cache misses. **Ans: b. c. f.** #### Q8. Given four threads A, B, C, and D. Among hardware multithreading results shown in X, Y, and Z, which of following statements are correct? ![](https://i.imgur.com/53KmFov.png) a. Y uses coarse-grained MT. b. X uses fine-grained MT. c. Z uses Simultaneous MT. d. X can hide both short and long stall cycles. e. Z can only hide long stall cycles. **Ans: b. d. e.** #### Q9. Consider a TLB with following parameters and the diagram. - The TLB has 512 entries and is 2-way set associative. - Virtual address are 64-bits and physical address are 32-bits - 8KB page size For the TLB diagram, which of following statements are correct? ![](https://i.imgur.com/7vJByWb.png) a. B is 8bits b. A is 43bits c. C is 13 bits d. D is 32 bits Ans: a. b. c. #### Q10. Consider a virtual memory system with the following properties: - 40-bit virtual byte address - 16-KB pages - 32-bit physical byte address - the valid, protection, dirty, and use bits take a total of 4 bits and that all the virtual pages are in use Which of following statements are correct? a. In virtual memory system, the size of a program can be large than total size of the physical memory in the system. b. The number of entries in the page table is $2^{26}$. c. Total size of the page table is $36\times2^{26}$ bits. d. The entry size of the page table is 36 bits. **Ans: a. b.** #### Q11. The following table show three processors and their operations on X (initially X = 0). After the processors have completed their operations on X, which of following values are possible for X if the system ensures cache coherence? | Processor 1 | Processor 2 | Processor 3 | |:-----------:|:-----------:|:-----------:| | X++; | X = X + 3 | X = 5 | a. 6 b. 1 c. 8 d. 4 e. 9 **Ans: a. c. e.** #### Q12. Which of following cases are impossible to occur? a. Page table miss, cache hit, TLB miss b. Page table hit, cache hit, TLB miss c. Page table hit, cache miss, TLB hit d. Page table miss, cache miss, TLB hit **Ans: a. d.** #### Q13. What item(s) below can be used to reduce miss penalty? a. hardware prefetch b. increase associativity c. non-blocking miss processing d. return requested word first e. increase block size **Ans: a. c. d.** #### Q14. Assume that the system has a direct-mapped cache with total data of 8192 bytes and the block size is16-byte. Consider the following piece of code where assume compiler has put x, y, and i in registers and that array A is in memory at address 0x10000 (int is 32-bit). Assuming that the cache starts out empty, which of following statements are correct? ![](https://i.imgur.com/bYwNxmN.png) a. Hit ratio: 3/4. b. If 2-way set associative cache with LRU replacement is used, the hit ratio is 3/4. c. If we still use direct-mapped cache but change the block size to 32-byte, the hit ratio is 1/8. d. Number of hit is 768. e. Number of misses is 1024. **Ans: a. b. d.** #### Q15. Given the two IEEE 754 single-precision floating-point numbers, A=0xBD80 0000 and B=0x3EE0 0000, for addition. Which of following statements are correct? ![](https://i.imgur.com/lIzoq7Y.png) a. The output of Mux1 is the exponent part of number A. b. The output of Mux2 is the fraction part of number B. c. The output of Mux1 is the exponent part of number B. d. The output of Mux2 will be shifted right for 1 bits. **Ans: c.** #### Q16. The following table show three processors and their operations on X (initially X = 0). After the processors have completed their operations, which of following values are possible for X if the system does not ensure cache coherence? | Processor 1 | Processor 2 | Processor 3 | |:-----------:|:-----------:|:-----------:| | X++; | X = X + 3 | X = 5 | a. 0 b. 7 c. 4 d. 3 e. 9 **Ans: c. d. e.**