owned this note
owned this note
Published
Linked with GitHub
---
title: Computer Organization
author: Yan-Tong Lin
tags: NCTU, 2020
---
# Computer Organization
[TOC]
## Labs
* [Computer Organization lab 0](/91w4a3v2TtqfwihAk-1ccQ)
* [Computer Organization lab 1](/ZLYhJj6BRgSIQz9daM9OZA)
* [Computer Organization lab 2](/8HEfOozyQau8GKq4SUOxQw)
* [Computer Organization lab 3]()
* [Computer Organization lab 4]()
## Notes
## HDL (2020/3/12)
[HDL gitbook](https://hom-wang.gitbooks.io/verilog-hdl/content/index.html)
[HDL note](/ZRHSG6oUQ7-xzc4YO3irkg)
## Ref - Verilog
[Ref - Verilog](https://hackmd.io/@dppa1008/Logic_Design_Mak?type=view)
## Week 2-2 (2020/3/12)
### Instruction Sets - Chapter 2
#### Difinition and General
* An agreement between architects and machine language programmers
* types of instructions
* data movement(load/save)
* data processing(ALU)
* branch
#### Different IS
* RISC-V
::: spoiler

* format
* Coding
* R(register)
* I(immediate)
* S(store)
* SB(branch)
* U(upper immediate)
* function code 7/3
* Register r1, r2
* Immediate Value
:::
* arm
* android, most popular 32-bit IS
* conditional code everywhere => execution in a row
* format : DP(data process)/DT(data transfer)/BR(branch)
* x86 - old IS
#### Concepts
##### Memory Layout
* branch, jump and link
* stack
* array
:::spoiler


:::
* rd -
* PC - Program Counter
* sp - stack pointer
### Performance - Chapter1
####
* Execution Time
* Elasped Time V.S. CPU time
* CPU Time = System CPU + User CPU
* $Performance = \frac{1}{Execution\ Time}$
* $Execution\ Time = Instruction\ Count * CPI * Cycle\ Time$
* 
## Week 3-2 Computation
### 2th Complement
### ALU
#### full adder
#### comparisons
#### carry lookahead
## Due to the cronovirus outbreak, the future classes will be online
## Week 5-1 Data Path(Cont.) (3/30 online)
- 3/30 Quiz 4
- Because if the mux doesn’t matter, it won't affect the result, and we want to emphasize that by giving it x(don’t care).
- 3 cases:
- Reg/MemWrite => If write options are open when they should not be open, it will result in undesired write, which will cause errors;
- MemRead=> Memory reading is a more high-cost task, and it doesn’t always happen; though it doesn’t affect the result, we don’t want to keep it open;
- RegRead is not controlled because it almost always happens.
- Q: why there is immediate shift 1
- A: for shorter IS design and PC+2

## Week 6
- 4/9 quiz
- A1: ld, sd(ex: ld x0 (8)x1) use x1+8 as memory address => need ALU to do addition, add instruction needs add.
- A2: beq use ALU substraction to see if it produces zero output, substruct instruction needs substract.
- wonder if zero is bitwise nor as I guess
- [data path reference](https://cseweb.ucsd.edu/classes/su06/cse141/slides/s06-1cyc_control-1up.pdf)
### Chap 4-1 - Datapath
- MISC vs RISC-V
- Control vs Mux
- Dont care?
### Chap 4-2 - Pipeline
## Week 7 - 4/13 class

### Risc-V pipeline
1. IF: Instruction fetch from memory
2. ID: Instruction decode & register read
3. EX: Execute operation or calculate address
4. MEM: Access memory operand
5. WB: Write result back to register
### Pipeline registers are required
### Design
- structural hazards: resource conflicts, e.g. only one memory
- sol: adding more resource
- control hazards: need to worry about branch instructions
- sol: resolve earlier, predict branch
- data hazards: an instruction depends on a previous instruction
- sol: forwarding, bypassing
### write Reg changes before apllied

### Corrected Datapath

### 2020/4/13 quiz

1. write reg and write data is inconsistent
2. fix by passing write reg information along with datapath and apply when is actually needed(Write Back stage)
### Pipeline Control

### Data Hazard and its solutions
#### Stall detection
#### Data Forwarding

Note: above by 2020/4/13, including peeking
## Week 7-2(Online Class)
### Online Class Q: Why immediate gen at that pos
- pros - no critical path(more important)
- cons - more use of pipeline register
### beq - zero-flag is critical path
- so beq address is passed back to PC in 4th cycle
### Data Dependency => Data Hazard
### Solutions to Data Hazard
- Stall
- ex: software: nop(no op)
- Data forwardind/by-passing
- pros: no stall
- cons: path add mask, forward unit to check dependency.
### Quiz 4/16

A:
1. To solve data hazard caused by data dependency.
2. need to add additional option for ALU input and add forward unit for control; thus will bring more cost to critical path
## Week 8-1
### Quiz 4/20
## Week 9
### Quiz 4/30
1. original + 7+4, load data hazard => +2
2. first load*2, then add*2 (... ld x2, ld x4, add x3 x1 x2, add x5 x1 x4 ... )
---
## Week 11
- Midterm score out : 103.5(mean 73.61)
### Quiz
1. reduce the time of loop control 2. avoid data dependency 3. 14/8, for solving hazards
## Week 12
- Memory Hierarchy
### Quiz
## Week 13
CO 5/25 quiz 17
1. has a lower missing rate and can use more cache strategy
2. LRU Replacement, by the name of it, replace the least recently used cache set with the new one
3. No, there are trade-offs, we should choose the appropriate one depending on the situation
4. disadvantages: (a)larger tag storage (b)larger decoder/mux to load larger tags © bigger hardware/power consumption (d) increase access time
---
## Week 15 - Before Final
### Important - CPU, MEM(DMA), IO
### IO interrupt / pulling / DMA


### DAM(exam )
- between IO and CPU
- Delegating I/O Responsibility from the CPU

#### Problems
- Cache May change
- VM is not spacial contious (because of hash)
### RAID
- correction
-
---
## Week 16 - Before final exam, self study
- https://hackmd.io/@sysprog/H1sZHv4R#%E7%AC%AC22%EF%BD%9E26%E8%AC%9B-Memory-78
- https://sites.google.com/site/nutncsie10412/ge-ren-jian-jie/ji-yi-ti
## Pipeline
### EX hazard / Mem hazard

- EX > Mem (more direct)
- Mem need to add !Ex to condition
- check register write
- Ex:
- Ex/Mem.RegWrite
- Mem:
- Mem/Wb.RegWrite
- check not 0 register(const)
- check register out = register in
- Ex:
- ID/Ex.RegRs1 == Ex/Mem.RegRd
- Mem:
- ID/Ex.RegRs1 == Mem/Wb.RegRd
### load-use hazard



- use load result
- deal at "IF/ID"
- things to do:
- No OP (set control = 0)
- No IF write
- No PC write
- condition
- is mememory read
- ID/EX. MemRead
- load use
- IF/ID. RegisterRs1/Rs2 == ID/EX.RegisterRd
### Quick summary


---
### branch hazard
- Naive - move to 1st/2nd cycle
- Delayed Branch
- delay slot and software istruction scheduling
- Dynamic Branch Prediction
- 1 bit/2 bit Branch-Prediction Buffer
- Correlating bpb
- BPB + BTB
----
### General BPB(branch prediction buffer)(a.k.a BHT)
- History Table of T/NT(not taken)
- in IF
- guess by history
- may reach 90% accuracy if well designed
- if wrong, flush pipeline(costly) + flip prediction
### 1-bit BHT

- branch target address k bits(middle close to end) hash to 1 bit
- can add beq/bne op-code to hash table?
- NO

- can store additional "last" PC, no need
- weakness:
- example: double loop
### 2-bit
### BTB(branch target buffer) (5/7 on 6/17)
- to avoid IF that get the target of branch
- store current PC + taget PC
- since current PC has to be a branch!!!
- (we cannot wait for IF now)
- teacher's error? (in BHT + IF case, isnt dont care 1/0 error, but can see decode
- can by-pass IF if save instruction istead of PC'
### Note BHT and BTB work together
- BHT for T/NT
- Q: is branch target address PC'?
- PC' => T/NT
- BTB for address
- PC => PC'
---
## Exception Handling
- flush pipeline
- to handle routine
- save address of offending address
- !!! **precise interrupts**
- complete prior istructions(is)
- stop offending is
- flush all following is
## Instruction Level Parrallelism (ILP)
- EXE time = IS per program * Clock per IS(CPI) * clock time
- Deeper Pipeline
- modern CPU ~20
- IPC * stage
- Multiple Issue Processor
- PC = PC+8
- IPC * 2
- static/dynamic at run time or pre decision
- static (VLIW, very long instruction word)
- compiler
- predecided which to send
- dymanic (superscalar)
- newer work
- hardware
- decide which to send on runtime
- what wil be duplicate?
- ALU, immediate-gen
- Register 2R1W => 4Read2Write
- Multiple Issue Cont.

---
## Memory

----
### Hardware
- SRAM
- sparse, fast, higher power, expensive
- power on => work
- logic gate
- DRAM
- dense, slow, lower power, inexpensive
- need refresh
- 2d matrix
- Disk
----
### Hierachy
- Block
- the minimal fixed size for moving
- also named "page" in VM
- Why work
- 90%-10% rule
- Locality
- temporal
- spacial
- settings
- Terms
- block
- hit
- miss
----
### Direct Map Scheme
- trivial
- idex/tag(32-2-bit index)/data
- Q: why -2 when 4B data
----
### Set Associativity
- set size * set num = cache
- one address can save to more places(in the set) than direct map(1)
- seach in set(set size)
----
### Calc

- K is 1024

----
### AMAT(5/25)




- associative too high => cycle time up
#### Block Size (5/21)

- spatial => bigger better
- temporal => cant be too big(for num of block down)
- ~64B
#### Associativity
- if up
- adv:
- reduce conflict(more flexible placement)
- inclusion property (teacher say follow LRU??)
- $1way \suset 2way ...$
- disadv:
- num of set down
- cycle time(for compares and mux => critical path)
- ~4 way


#### replacement policy
- LRU
### Conclusion(Miss Rate Improvements)

### miss penalty improvement(5/28)
- multi level
- L1, L2
- calc
- can optimize saperately with different size of L1, L2
- larger block/associativity in L2
- the concept can be used for split/unified mem too
### Bandwidth (Bus size / Interleave)


----
### Write Back (6/1)
- dirty bit
----
### VM
- page
---
## Chapter 6 - Parallel Process, Client to Cloud / Storage Bus IO
### multi CPU and speed up
### Instruction and DATA stream

#### Vector Processor / GPU (SIMD)
#### Multithreading (MIMD)
#### SMT(6/8)
- simaltenous mt
- shared thread
- different from issue by switching

#### SMP
- shared data(mem)