CMPD223 Computer Organization

--- title: CMPD223 Computer Organization description: Compiled from Madam Low's lecture notes and included with some explanation. Convenient for mobile view and have navigation tags: notes --- # Chapter 1: Introduction ## Computer Architecture - Refers to those attributes of a system visible to a programmer - Have a direct impact on the logical execution of a program - e.g. Instruction Set Architecture (ISA) defines: - instruction format, instruction opcodes, registers, instruction and data memory - the effect of executed instructions on the registers and memory - an algorithm for controlling instruction execution - Architectural attributes include: - the instruction set - the number of bits used to represent various data types - I/O mechanism - techniques for addressing memory ## Computer Organization - Refers to the operational units and their interconnections that realize the architectural specifications - Organizational attributes include: - hardware details transparent to the programmer such as control signals - interfaces between the computer and peripherals - the memory technology used - For example, it is an **architectural** issue whether a computer will have a multiply instruction. - It is an **organizational** issue whether that instruction will be implemented by a special multiply unit or by a mechanism that makes repeated use of the add unit of the system - The organizational decision may be based on the anticipated frequency of use of the multiply instruction, the relative speed of the two approaches, and the cost and physical size of a special multiply unit ## Structure and Function - A hierarchical system is a set of interrelated subsystems - Designer only deal with a particular level of a system at a time - At each level, the system consists of a set of components and their interrelationships - The behavior of at each level depends only on a simplified, abstracted characterization of the system at the next lower level - At each level, the designer is concerned with structure and function: - **Structure**: The way in which component are interrelated - **Function**: The operation of each individual component as part of the structure ### Function There are four basic functions that a computer can perform: - **Data processing**: Data may take a wide variety of forms, and the range of processing requirements is broad. However, there are only a few fundamental methods or types of data processing - **Data storage**: Even if the computer is processing data on the fly, the computer must temporarily store at least those pieces of data that are being worked on at any given moment - **Data movement**: The computer's operating environment consists of devices that serve as either sources or destinations of data - **Control**: Within the computer, a control unit manages the computer's resources and orchestrates the performance of its functional parts in response to instructions ### Structure (Simple Single-Processor Computer) ![The Computer: Top-Level Structure ](https://i.imgur.com/ficsmrj.png) #### Computer Main Structural Components - **Central processing unit (CPU)**: Controls the operation of the computer and performs its data processing functions - **Main memory**: Stores data - **I/O**: Moves data between the computer and its external environment - **System interconnection**: Some mechanism that provides for communication among CPU, main memory, and I/O. (e.g. system bus) #### CPU Main Structural Components - **Control unit**: Controls the operation of the CPU and hence the computer - **Arithmetic and logic unit (ALU)**: Performs the computer's data processing functions - **Registers**: Provides storage internal to the CPU - **CPU interconnection**: Some mechanism that provides for communication among the control unit, ALU, and registers ### Structure (Multicore Computer) Contemporary computer generally have multiple processors. The term *multicore computer* us used when these processors all reside on a single chip. Each processing unit is called a *core*. ![Simplified View of Major Elements of a Multicore Computer ](https://i.imgur.com/PRsISdi.png) - **Central processing unit (CPU)**: That portion of a computer that fetches and executes instructions. It consists an **ALU**, a **control unit**, and **registers**. - **Core**: An individual processing unit on a processor chip. A core may be equivalent in functionality to a CPU on a single-CPU system. Other specialized processing units (e.g. for vectors and matrix) also referred to as cores - **Processor**: A physical piece of silicon containing one or more cores. The processor is the computer component that interprets and executes instructions. If a processor contains multiple cores, it is referred to as **multicore processor**. #### Cache memory - Mulitple layers of memory between the processor and main memory - Is smaller and faster than main memory - Used to speed up memory access by placing in the cache data form main memory that is likely to be used in the near future - Greater performance obtained by using multiple levels of cache, with level 1 (L1) closest to the core and additional levels (L2, L3, ...) progressively farther from the core. - Level *n* is smaller and faster than level *n*+1 # Chapter 2A: Computer Evolution - **1^st^ generation**: Vacuum tube - **2^nd^ generation**: Transistor - **3^rd^ generation**: Integrated circuit - **Later generations**: Semiconductor & microprocessor ## First generation: Vacuum tube 1. [ENIAC](#ENIAC) 3. [Von Neumann Machine](#von-Neumann-Machine) - EDVAC - IAS computer 5. [Commercial computers](#Commercial-computers) - UNIVAC - IBM ### ENIAC - **E**lectronic **N**umerical **I**ntegrator **A**nd **C**omputer - Designed and constructed at the University of Pennsylvania - Started in 1943 -- completed in 1946 - By John Mauchly and John Eckert - World's first general purpose electronic digital computer - Occupied 1500 square feet of floor space - Contained more than 18,000 vacuum tubes - 140kw power consumption - Capable of 5000 additions per second - Decimal rather than binary machine - Need manual programming by setting switches and plugging/unplugging cables ### von Neumann Machine - Stored program concept - Attributed to ENIAC designers, most notably John von Neumann - Program represented in a form suitable for storing in memory alongside data - 1^st^ version: **EDVAC** - Electronic Discrete Variable Computer - 1945 - 2^nd^ version: **IAS computer** - Prototype of all subsequent general-purpose computers - 1952 ![Structure of the IAS computer](https://i.imgur.com/rLtXc15.png) #### IAS Memory Formats - Consists of 4096 storage locations (called words) of 40 bits each - Both **data** and **instructions** are stored there - Data is represented in binary - Instruction is represented in binary, consisting opcode and address ![](https://i.imgur.com/8LQ7USB.png) #### IAS Registers :::spoiler ***Memory buffer register (MBR)*** - contains a word to be stored in memory or sent to the I/O unit - used to receive a word from memory pr from the I/O unit ::: :::spoiler ***Memory access register (MAR)*** - specifies the address in memory of the word to be written from or read into the MBR ::: :::spoiler ***Instruction register (IR)*** - contains the 8-bit opcode instruction being executed ::: :::spoiler ***Instruction buffer register (IBR)*** - temporarily hold right-hand instruction from a word in memory ::: :::spoiler ***Program counter (PC)*** - contains address of the next instruction pair to be fetched from memory ::: :::spoiler ***Accumulator (AC) and multiplier quotient(MQ)*** - temporarily hold operands and results of ALU operations ::: ### Commercial computers - **UNIVAC** (Universal Automatic Computer): - 1947 -- Eckert and Mauchly - UNIVAC I - First successful commercial computer - intended for both scientific and commercial applications - commissioned by US Bureau of Census for 1950 calculations - UNIVAC II - greater memory capacity and higher performance - backward compatible - **IBM** - most popular commercial computer company - major manufacturer of punched-card processing equipment - first electronic stored-program computer (701) in 1955 - 702 in 1955 - suitable for business applications ## Second generation: Transistors - Smaller - Cheaper - Dissipates less heat - Made from silicon - Invented at Bell Labs in 1947 - Full transistor computer not commercially available until late 1950's ### Transistor based computers - More complex arithmetic and logic units and control units - Use high-level programming languages - Provision of system software which provided the ability to: - load programs - move data to peripherals and libraries - perform common computations **IBM** 7000 series **DEC** PDP-1 in 1957 ## Third generation: Integrated Circuits - 1958 - Discrete component - single, self-contained transistor - manufactured separately, packaged in their own containers and soldered or wired together onto masonite-like circuit boards - manufacturing process was expensive and cumbersome - Two most important members of the third generation: - **IBM System/360** - 1964 - Replaced 7000 series (not compatible) - First planned "family" of computer - similar or identical instruction sets - similar or identical OS - **DEC PDP-8** - 1964 - First minicomputer - Did not need air conditioned room - Small enough to sit on a lab bench - $16k vs IBM's $100k - OEM Embedded applications onto PDP-8 - Introducing bus structure - Two fundamental types of components required in digital computer: - **gates** ![](https://i.imgur.com/QKo60PN.png) - **memory cells** ![](https://i.imgur.com/sxDpvme.png) - A computer consists of gates, memory cells, and interconnections among these elements - The gates and memory cells are manufactured on a semiconductor, such as silicon wafer - Many transistors can be produced at the same time on a single wafer of silicon - Transistors can be connected with a processor metallization to form circuits ### Four main function of a computer Data storage : provided by memory cells Data processing : provided by gates Data movement : the paths among components are used to move data from memory to memory and from memory through gates to memory Control : the paths among components can carry control signal (WRITE and READ control signal) ## Later generation - Semiconductor memory - Microprocessor ### Semiconductor memory - The first application of integrated circuit technology to computers was the construction of the processor (the control unit and the ALU). - Same technology is used to construct memories - 1950 - 1960, most computer memory was constructed from tiny rings of ferromagnetic material - It was fast, but too expensive, bulky, and destructive readout. Reading a core erased the data stored in it #### Fairchild - In 1970, Fairchild produced the chip about the size of a single core - It could hold 256bits of memory, nondestructive and much faster than core - Capacity approximately doubles each year ### Microprocessor - The density of elements on processor chips continued to rise - more elements placed on each chip, so fewer chips needed to construct a single computer processor - 1971, Intel developed 4004 - first chip to contain all components of CPU on a single chip - birth of microprocessor - 1972, Intel developed 8008 - first 8-bit microprocessor - 1974, Intel developed 8080 - first general purpose microprocessor - faster, has richer instruction set, has large addressing capability #### CPU Architecture - **CISC**: Complex Instruction Set Computers (e.g. Intel x86) - Large number of instructions in each instruction set. Each newly design processor can use its predecessor's instruction set - Each single instruction can perform a few low-level instruction - e.g. load from memory, arithmetic operation, memory store - accomplished with large number of addressing modes (how to specify operands and operations of instructions) - Have complex instruction format - therefore takes up multiple clock cycles to execute each single instruction - Requires more complex hardware circuitry - causes semantic gap (difference between operations on high level language and those in computer architecture) - **RISC**: Reduced Instruction Set Computers (e.g. ARM) - Each instruction consists very few instructions - Only uses simple commands that can be divided into several instructions which achieve low-level instruction - Simple instruction format, therefore can execute one instruction per clock cycle - Simpler hardware circuitry and addressing mode - Register references ##### ARM Designed to meet the needs of three system categories: - **Embedded real-time systems**: Systems for storage, automotive body, industrial etc. - **Application platforms**: Devices running open operating system - **Secure application**: Smart cards, SIM cards, and payment terminals # Chapter 2B: Computer Performance Issues ## Microprocessor speed Techniques built into contemporary processors to ensure smooth stream flows: - **Pipelining**: Processor moves data or instruction into a conceptual pipe with all stages of the pipe processing simultaneously. - e.g. while one instruction is being executed, the computer is decoding the next instruction - **Branch prediction**: Processor looks ahead in the instruction code fetched from memory and predicts which branches, or groups of instruction are likely to be processed next - **Superscalar execution**: Ability to issue more than one instruction every processor per clock cycle - **Data flow analysis**: Processor analyzes which instructions are dependent on each other's results, or data, to create an optimized schedule of instructions - **Speculative execution**: Using branch analysis and data flow analysis, speculatively execute instructions ==ahead== of their actual appearance in the program execution, holding the results in temporary locations keeping execution engines as busy as possible ## Performance balance Processor speed increased, memory capacity increased, but **memory speed** lags behind processor speed A number of ways have been developed to attack this problem: - Increase number of bits retrieved at one time by making DRAMs **wider** rather than "deeper" and by using wide bus data paths - Change DRAM interface to make it more efficient by including a cache or other buffering scheme on DRAM chip - Reduce frequency of memory access by incorporating increasingly complex and efficient cache structures between processor and main memory - Increase the interconnect bandwidth between processors and memory by using higher-speed buses and a hierarchy of buses to buffer and structure data flow ### I/O Devices Another area of design focus is the handling of I/O devices. As computers become faster and more capable, - more sophisticated applications are developed that support the use of peripherals with intensive I/O demands - devices create tremendous data throughput demand - while processor can handle the data, there's problem getting data moved between processor and peripheral #### Solution - Caching - Buffering - Higher-speed interconnection buses - More elaborate bus structures - Multiple-processor configurations ## Improvement in Chip Organization & Architecture - Increase hardware speed of processor - due to shrinking size of the logic gates - more gates can be packed together tightly, increasing clock speed - propagation time for signals reduced - Increase size and speed of caches - by dedicating a portion of the processor chip itself to the cache, cache access times drop significantly - Change processor organization and architecture to increase effective speed of instruction execution - typically involves using parallelism in one form or another ### Problem with high clock speed & logic density - **Power** - power density increase - difficulty to dissipate heat - **RC delay** (Resistive-capacitive delay) - chip size decrease thinner wire, wire closer together - thinner wire higher resistance - closer wire higher capacitance - delay = resistance x capacitance - thus, delay increased - **Memory latency** - Memory access speed (latency) and transfer speed (throughput) lag processor speeds ### Multicore Because simply increasing clock rate leads to power and heat problem, and some fundamental physical limits are being reach, there are development being done on multicore computer chip - Use of multiple processors on the same chip provides potential to increase performance without increasing the clock rate - Strategy is to use two simpler processors on the chip rather than one of increasingly complex processor - With two processors, larger caches are justified - As cache became larger, it made performance sense to create two and then three levels of cache on a chip # Chapter 3: Top-level View of Computer Function & Interconnection ::: spoiler **What is a program?** - A sequence of steps - For each step, an arithmetic or logical operation is done - For each operation, a different set of control signals is needed ::: ::: spoiler **Function of control unit** - For each operation, a unique code is provided - e.g. ADD, MOVE - A hardware segment accepts the code and issues the control signals ::: ## Components in Computer - The Control Unit and the Arithmetic and Logic Unit constitute the Central Processing Unit - Data and instructions need to get into the system and results out - Input/output - Temporary storage of code and results is needed - Main memory ![](https://i.imgur.com/gXLH4nX.png) ### Type of Registers :::spoiler **Program counter (PC)** Holds the address of instruction to be fetched next ::: :::spoiler **Instruction Register (IR)** Loads fetched instruction ::: :::spoiler **Memory address register (MAR)** Specifies the address in memory for the next read or write ::: :::spoiler **Memory buffer register** Contains the data to be written into memory or receives the data read from memory ::: :::spoiler **I/O address register** Specifies a particular I/O device ::: :::spoiler **I/O buffer register** Used for the exchange of data between an I/O module and the CPU ::: ## Instruction Fetch and Execute Cycle ### Fetch cycle ![](https://i.imgur.com/9LzXs2Y.png) - **Program counter** (PC) holds address of next instruction to fetch - Processor fetches instruction from memory location pointed by the PC - Increment PC, unless told otherwise - Instruction loaded into **Instruction Register** (IR) - Processor interprets instruction and performs required actions ### Execute cycle Four categories of actions that can be performed by processor: - **Processor memory**: Data transfer between CPU and main memory - **Processor I/O**: Data transfer between CPU and I/O module - **Data processing**: Some arithmetic or logical operation on data - **Control**: Specify that the sequence of execution be altered (PC increment altered) An instruction's execution may involve a combination of these actions. #### Example of program execution (add two numbers) ![](https://i.imgur.com/16NjDmD.png) > ***0001 (1)***= Load AC from memory > ***0010 (2)*** = Store AC to memory > ***0101 (5)*** = Add to AC from memory [color=yellow] Three instructions, which can be described as three fetches and three execution cycles: 1. The PC contains 300, the address of the first instruction. This instruction (1940) is loaded into the instruction register IR, and the PC is incremented 2. The first hexadecimal digit in the IR indicate AC is to be loaded. The remaining three hexadecimal digits specify address (940) from which data are to be loaded 3. The next instruction (5941) is fetched from location 301, and the PC is incremented 4. THe old contents of the AC and the contents of location 941 are added, and the result is stored in the AC 5. The next instruction (2941)is fetched from location 3302, and the PC is incremented 6. The contents of the AC are stored in location 941 Figure below provides a more detailed look at the basic instruction cycle of Figure 3.3. ![Instruction Cycle State Diagram](https://i.imgur.com/CL78Wu7.png) - **Instruction address calculation (iac)**: Determine the address of the next instruction to be executed. Usually involves adding a fixed number to the address of the previous instruction. - **Instruction fetch (if)**: Read instruction from its memory location into the processor. - **Instruction operation decoding (iod)**: Analyze instruction to determine type of operation to be performed and operand(s) to be used. - **Operand address calculation (oac)**: If the operation involves reference to an operand in memory or available via I/O, then determine the address of the operand. - **Operand fetch (of)**: Fetch the operand from memory or read it in from I/O - **Data operation (do)**: Perform the operation indicated in the instruction - **Operand store (os)**: Write the result into memory or out to I/O :::info :bulb: **Info**: States in the upper part involve an exchange between the processor and either memory or an I/O module. States in the lower part involve only internal processor operations ::: ## Interrupts Mechanism by which other modules (e.g. I/O, memory) may **interrupt** the normal processing of the processor :::success :book: **Classes of Interrupts** **Program**: Generated by some condition that occurs as a result of an instruction execution, such as arithmetic overflow, division by zero, attempt to execute an illegal machine instruction, or reference outside a user's allowed memory space **Timer**: Generated by a timer within the processor. This allows operating system to perform certain functions on a regular basis **I/O**: Generated by an I/O controller, to signal normal completion of an operation, request service from the processor, or to signal a variety of error conditions **Hardware Failure**: Generated by a failure such as power failure or memory parity error ::: ![Flow](https://i.imgur.com/qrBAVMW.png) **==Figure 3.7a==** :one: -- :three:: Sequences of instructions that do not involve I/O :four:: Sequence of instructions to prepare actual I/O operation. May include copying data to special buffer *Actual I/O command*: Without interrupts, once this command is issued, the program must wait for I/O device to perform required function :five:: Sequence of instruction to complete operation. May include setting a flag indicating success or failure ### Interrupts and the instruction cycle **==Figure 3.7b==** - With interrupts, processor can engage in executing other instructions while an I/O operation is in progress. - The I/O program invoked consists only the preparation code and the actual I/O command. - After these few instructions have been executed, control returns to the user program. - Meanwhile, the external device is busy accepting data from computer memory and printing it. This I/O operation conducted **concurrently** with the execution of instructions in the user program (:one:--:three:) - When the external device becomes ready to be serviced -- ready to accept more data from the processor -- the I/O module for that external device sends an **interrupt request** signal to processor - The processor responds by suspending operation of current program, branching off to **interrupt handler**. The point where this interrupts occur are marked with $\times$ in Figure 3.7b ![TOC](https://i.imgur.com/W2FVI2h.png) **==Figure 3.9==** - Processor checks for any interrupts, indicated by the presence of interrupt signal - If no pending interrupts, processor proceeds to the fetch cycle and fetches the next instruction - If an interrupt is pending, the processor will do the following: - suspends execution of the current program - save its context - set PC to start address of interrupt handler routine - process interrupt - restore context and continue interrupted program ### Program Timing: Short and Long I/O Wait There is some overhead involved in interrupt process. Thus, extra instructions must be executed in the interrupt handler to determine the nature of the interrupt to decide on the appropriate action. ![reference link](https://i.imgur.com/0vz57Ei.png) **==Figure 3.10==** - Green shade user program code segment. Gray shade are I/O program code segment. - Figure 3.10a shows the case which interrupts are not used. Processor must wait while an I/O operation is performed - Figure 3.10b assume the time required for the I/O operation is short: - less than the time to complete the execution of instructions between operations in the user program - Segment 2 is interrupted - A portion of the code (2a) executes while the I/O operation is performed and then the interrupt occurs upon the completion of the I/O operation ![](https://i.imgur.com/k3kr9eZ.png) **==Figure 3.11==** - Typically, especially for slow device such as printer, I/O operation will take much more time than executing a sequence of user instruction - In this case, the user program reaches the second WRITE call before the first I/O operation complete - The user program is hung up at that point - After first I/O operation completed, this new WRITE may proceed and a new I/O operation may be started - There is still efficiency gain, compared to without interrupt - Figure below shows revised instruction cycle state diagram (from Figure 3.6) that includes interrupt cycle processing ![reference link](https://i.imgur.com/erjNCqX.png) ### Multiple Interrupts Two approaches can be taken to deal with multiple interrupts 1. **Disable interrupts** (Figure 3.13a) - Processor will ignore further interrupts whilst processing one interrupt - Interrupts remain pending and are checked after first interrupt has been processed - Interrupts handled **in sequence** as they occur 2. **Define priorities** (Figure 3.13b) - Low priority interrupts can be interrupted by higher priority interrupts - When higher priority has been processed, processor returns to previous interrupt ![reference link](https://i.imgur.com/mf8MWqK.png) ## Interconnection Structure Computer consists of modules of 3 basic types: 1. Processor 2. Memory 3. I/O The collection of path connecting the various modules is called **interconnection structure** ### Memory connection ![reference](https://i.imgur.com/fBQStCN.png) - A memory module will consists of *N* words of equal length - Each word is assigned a unique numerical address (0, 1, ... , *N*-1) - Memory receives and sends data - Receives addresses - Receives control signals - Read - Write - Timing ### I/O module connection ![reference link](https://i.imgur.com/W940LeQ.png) - Similar to memory from computer's viewpoint - **Output** - Receive data from computer - Send data to peripheral - **Input** - Receive data from peripheral - Send data to computer - Receive control signals from computer - Send control signals to peripherals - Receive addresses from computer - Send interrupt signals to processor (control) ### CPU connection ![reference link](https://i.imgur.com/ZJWnpJ1.png) - Reads instruction and data - Writes out data after processing - Sends control signals to other units - Receives and acts on interrupts ### Interconnection Type of Transfers Interconnection structure must support the following types of transfers: - Memory to processor - Processor to memory - I/O to processor - Processor to I/O - I/O to or from memory ## Bus Interconnection - A bus is a communication pathway connecting two or more devices - **Key characteristic**: It is a shared transmission medium. - Signal transmitted by any one device are available for reception by all other devices attached to the bus - If tow devices transmit during the same time period, their signals will overlap and become garbled - Typically consists of multiple communication lines - Each line is capable of transmitting signals representing binary 0 and 1 - Computer systems contain a number of different buses that provide pathways between components at various levels of the computer system hierarchy - The most common computer interconnection structures are based on the use of one or more system buses ### System Bus - A bus that connects major computer components (processor, memory, I/O) - Consists of more than 50 separate lines - Each line is assigned with a particular function, can be classified into three functional groups: - Data line - Address line - Control line ![](https://i.imgur.com/q6RALL6.png) #### Data bus - Data lines that provide a path for moving data among system modules - May consist 32, 64, 128, or more separate lines - The number of lines: - referred to as the width of the data bus - determines how many bits can be transferred at a time - a key factor in determining overall system performance #### Address bus - Used to designate the source or destination of the data on the data bus - If the processor wishes to read a word of data from memory, it puts the address of the desired word on the address lines - Width determines the maximum possible memory capacity of the system - Also used to address I/O ports - The higher order bits are used to select a particular module on the bus and the lower order bits select a memory location or I/O port within the module #### Control bus - Used to control the access and the use of the data and address lines - Because the data and address lines are shared by all components, there must be a means of controlling their use - Control signals transmit both command and timing information among system modules - Timing signals indicate the validity if data and address information - Command signals specify operations to be performed ### Bus operation - If one module wishes to send data to another, it must: - obtain the use of the bus - transfer the data via the bus - If one module wishes to request data from another module, it must: - obtain the use of the bus - transfer a request to the other module over the appropriate control and address lines - then wait for that second module to send the data ### Buses appearance Buses came in many form factor - Parallel lines on circuit boards - Ribbon cables - Strip connectors on mother boards - e.g PCI - Sets of wire ### Single Bus Problems - In general more devices attached to bus will increase bus length greater propagation delay - Besides that, the data bus may become a bottleneck as the aggregate data transfer demand approaches the capacity of the bus - This problem can be countered to some extent by increasing the data rate that the bus can carry and by using wider buses - However, data rates generated by attached devices are growing rapidly - Most systems use multiple buses to overcome these problems - system bus - high-speed bus - expansion bus - local bus # Chapter 4: Memory - Memory access time have been the **limiting factor** of computer's performance in the past - Memory speed is slow compared to processor - A process could be bottlenecked by memory system's ability to keep up with processor #### Characteristics of memory systems - Location - Capacity - Unit of transfer - Access method - Performance - Physical type - Physical characteristics - Hierarchy ## Location - Whether memory is **internal** and **external** to the computer - Internal memory is equated with main memory - Processor requires its own memory (register) - Cache is another form of internal memory - External memory are peripheral storage devices that are accessible to the processor via I/O controllers ## Capacity - Internal memory capacity expressed in terms of bytes (1 byte = 8 bits) or **words** - External memory expressed in terms of bytes ## Unit of transfer ### Internal memory Number of electrical lines into and out of the memory module, governed by data bus width. ### Main memory (RAM) Number of bits read out of or written into memory at a time ### External memory Data are much larger units than a word, thus referred as **blocks** ## Access method ### Sequential access - Memory is organized into units of data, called **records** - Access must be made in a specific linear sequence -- Start at the beginning and read through in order - Access time depends on location of data and previous location - **Example**: Tape ### Direct access - Individual blocks or records have a unique address based on physical location - Access is by jumping to vicinity, plus sequential searching, counting or waiting to reach final location - Access time depends on location and previous location - **Example**: disk ### Random access - Each addressable location in memory has a unique, physically wired-in addressing mechanism - Any location can be selected at random and directly addressed and accessed - The access time is independent of the sequence of prior accesses and is constant - **Example**: Main memory (RAM) ### Associative - A word is retrieved based on a portion of its contents rather than its address - Each location has its own addressing mechanism - Retrieval time is constant independent of location or prior access patterns - **Example**: Cache ## Performance Three performance parameters: - Access time (latency) - Memory cycle time - Transfer rate ### Access time - Time between presenting the address and getting the valid data - **Random access memory**: Time taken to perform a read or write operation - **Non-random access memory**: Time taken to position the read-write mechanism at the desired location ### Memory cycle time - Time may be required for memory to recover before next access - Access time plus any additional time required before access can commence ### Transfer rate - The rate at which data can be transferred into or out of a memory unit - For random access memory, it is equal to $1/\text{memory}$ cycle time ## Physical types - Semiconductor - RAM & ROM - Magnetic - Disk & tape - Optical - CD & DVD - Others - Bubble - Hologram ## Physical characteristics ### Volatility - **Volatile**: - Information decays naturally when electrical power is switched off - e.g. RAM - **Non-volatile**: - information once recorded remains without deterioration until deliberately changed - no electricity needed to retain information ### Erasability - **Erasable**: - content can be deleted rewritten - e.g. Flash memory - **Non-erasable**: - content cannot be altered, except by destroying the storage unit - must be non-volatile - e.g. ROM ## Memory Hierarchy - Computer memory design constraint can be summed up by: - how much - how fast - how expensive - There is **trade-off** between capacity, access time, and cost - Faster access time, greater cost per bit - Greater capacity, smaller cost per bit - Greater capacity, slower access time - Because of that, computer cannot rely on a single memory component or technology, but to employ a memory hierarchy ![](https://i.imgur.com/Wj2zyOf.png) - Going down: - decreasing cost per bit - increasing capacity - increasing access time - decreasing processor's frequency to access memory ### Hierarchy list - Registers - L1 cache - L2 cache - Main memory - Disk cache - Disk - Optical - Tape # Chapter 4A: Cache Memory - Small amount of fast memory - Sits between normal main memory and CPU - May be located on CPU chip or module ![](https://i.imgur.com/hGAsmae.png) ## Cache & Main Memory Structure ![](https://i.imgur.com/2FDC1i4.png) ![](https://i.imgur.com/xISA5WG.png) ## Cache Operation ![](https://i.imgur.com/0Paio7P.png) 1. CPU requests contents of memory location 2. Check cache for this data 3. If present, get from cache (fast) 4. If not present, read required block from main memory to cache 5. Then deliver from cache to CPU 6. Cache includes tags to identify which block of main memory is each cache slot ![](https://i.imgur.com/8hZRmkl.png) ## Elements of Cache Design - Cache address - Logical - Physical - Cache size - Mapping function - Direct - Associative - Set associative - Replacement algorithm - Least recently used (LRU) - First in first out (FIFO) - Least frequently used (LFU) - Random - Write policy - Write through - Write back - Line size - Number of caches - Single or two level - Unified or split ### Cache addressing - Where does cache sit? - Between processor and virtual memory management unit (MMU) - Between MMU and main memory - 2 types of cache addressing: - **Logical cache** (virtual cache) stores data using virtual addresses - **Physical cache** stores data using main memory physical addresses - For reads and writes from main memory, MMU translates each virtual address into physical address in main memory ![](https://i.imgur.com/mb3Gu4A.png) ### Cache Size #### Cost - More cache is expensive #### Speed - More cache is faster (up to a point) - Checking cache for data takes time ### Mapping function - Mapping function determines **how** a block from main memory is placed on the cache - 3 mapping function: - Direct - Associative - Set associative #### Direct mapping - Each block of main memory maps into only one cache line - i.e. if a block is in cache, it must be in one specific place - Address is in two parts - **Least Significant** w bits identify unique word - **Most Significant** specify one memory block - MSB are split into a cache line field r and a tag of s-r (most significant) ![](https://i.imgur.com/AwJlYuw.png) - **Advantages**: - Simplest technique - Inexpensive - **Disadvantages** - Fixed location for given block - if a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high. #### Associative mapping - A main memory block can load int any line of cache - Memory address is interpreted as tag and word - Tag uniquely identifies block of memory - Every line's tag is examined for a match - Cache searching gets expensive #### Set Associative Mapping - Cache is divided into a number of sets - Each set contains a number of lines - A given block maps to any line in a given set - e.g. Block B can be in any line of set i - e.g. 2 lines per set - 2 way associative mapping - A given block can be in one of 2 lines in only one set ### Replacement algorithms - For direct mapping, there is only one possible line for any particular block and no choice is possible - For associative and set-associative techniques, a replacement algorithm is needed - To achieve high speed, an algorithm must be implemented in hardware - The four most common replacement algorithms are: - Least recently used (LRU) - First-in-first-out (FIFO) - Least frequently used (LFU) - Random #### Least recently used (LRU) - Most effective - Replace that block in the set that has been in the cache longest with no reference to it - Most popular algorithm due to its simplicity #### First-in-first-out (FIFO) - Replace that block in the set that has been in the cache longest #### Least Frequently Used (LFU) - Replace block in the set that has experienced the fewest references - Could be implemented by associating a counter with each line ### Write Policy - Must not overwrite a cache block unless main memory is up to date - Problem to be faced - Multiple CPUs may have individual caches - I/O may address main memory directly - 2 write techniques: - Write through - Write back #### Write through - Simplest technique - All write operations are made to main memory as well as to the cache - Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date - The main disadvantage of this technique is it generates substantial memory traffic and may create a bottleneck #### Write back - Minimizes memory writes - Updates are made only in the cache - When an update occurs, a dirty bit (use bit) associated with the line is set - Then, when a block is replaced, it is written back to main memory if the dirty bit is set - Portions of main memory are invalid and hence accesses by I/O modules only through the cache - This makes for complex circuitry and a potential bottleneck ### Line Size - When a block of data is retrieved and placed in the cache, not only the desired word but also some number of adjacent words are retrieved - As the block size increases, the hit ratio will at first increase because of the principle of locality - As the block size increases, more useful data are brought into the cache - The hit ratio will begin to decrease as the block becomes bigger and the probability of using the newly fetched information becomes less than the probability of reusing the information that has to be replaced - Two specific effects come into play: - Larger blocks reduce the number of blocks that fit into a cache - As a block becomes larger, each additional word is farther from the requested word ### Number of caches - Multiple caches system focuses on two main aspects: - The number of levels of caches -- multilevel cache - Unified or split caches #### Multilevel caches - As logic density has increased, it has become possible to have a cache on the same chip as the processor - The on-chip cache reduces the processor's external bus activity, speeds up execution time and increases overall system performance - Two-level cache: - Internal cache designated as level 1 (L1) - External cache designated as level 2 (L2) - Potential savings due to the use of an L2 cache depends on the hit rates in both L1 and L2 caches ![](https://i.imgur.com/JFtZhy1.png) - L2 has little effect on the total number of cache hits until it is at least double the L1 cache size #### Unified vs Split Caches - Has become common to split cache: - One dedicated to instructions - One dedicated to data - Both exist at the same level, typically as two L1 caches - Advantages of unified cache - **Higher hit rate** - Balances load of instruction and data fetches automatically - Only one cache needs to be designed and implemented - Advantages of split cache - **Eliminates cache contention between instruction fetch/decode unit and execution unit** - Important in pipelining # Chapter 4B (1): Internal Memory - A type of semiconductor memory - Category of internal memory: - Cache - RAM - DRAM - SRAM - ROM - PROM - EPROM - EEPROM - Flash memory #### Semiconductor Memory - Cell -- basic element of semiconductor memory - All semiconductor memory cells share these properties: - Have 2 stable states -- used to represent binary 1, 0 - capable of being written to set the state (condition) - capable of being read to sense the state ## Memory cell operation ![](https://i.imgur.com/ow9cbYe.png) ## Functional terminals in memory cell Memory cell has 3 functional terminals capable of carrying an electrical signal - **Select terminal** -- selects a memory cell for a read or write operation - **Control terminal** -- indicates read or write - **Data in / Sense terminal** - For writing -- the data in terminal provides an electrical signal that sets the state of the cell to 1 or 0 - For reading -- the sense terminal is used for output of the cell's state ## Semiconductor Memory Types ![](https://i.imgur.com/SZI0R74.png) ### RAM - Random access memory - All semiconductor memory is random access - Characteristic of RAM - Read/Write -- read data from the memory and to write new data into the memory - Reading/writing via electrical signals - Volatile -- must have constant power supply else data lost - Temporary storage - 2 major RAM technologies: - Dynamic RAM (DRAM) - Static RAM (SRAM) #### Dynamic RAM - Analog device - Made with cells that store data as charge on capacitors - Presence or absence of charge in a capacitor is interpreted as a binary 1 or 0 - Requires periodic charge refreshing to maintain data storage - The term dynamic refers to tendency of the stored charge to leak away, even with power continuously applied ##### Dynamic RAM Structure ![](https://i.imgur.com/qHfPaTa.png) ##### Dynamic RAM Operation - Address line is activated when the bit value from this cell is to be read or written - **Write** - Voltage to bit line -- hig for 1, low for 0 - Then signal address line -- transfers charge to capacitor - **Read** - Address line selected -- transistor turns on - Charge from capacitor fed via bit line to sense amplifier -- amplifier compares with reference value to determine 0 or 1 - Capacitor charge must be stored #### Static RAM (SRAM) - Digital device that uses the same logic elements used in the processor - Binary values are stored using traditional flip-flop logic configurations - Will hold data as long as power is supplied to it ##### SRAM structure ![](https://i.imgur.com/A0BcngP.png) ##### SRAM operation - Transistor arrangement gives stable logic state: - State 1: - $C_1$ high, $C_2$ low - $T_1T_4$ off, $T_2T_3$ on - State 2: - $C_2$ high, $C_1$ low - $T_2T_3$ off, $T_1T_4$ on - The address line controls two transistors ($T_5T_6$) - When a signal is appiled to this line, $T_5T_6$ are switched on, allowing a read or write operation - **Write** -- bit value is applied to line $B$, its complement to line $\bar B$ - **Read** -- bit value is read from line B #### SRAM vs DRAM - Both volatile - Dynamic RAM - Simpler to build, smaller - More dense (smaller cells = more cells per unit area) - Less expensive - Requires the supporting refresh circuitry - Tend to be favored for large memory requirements - Used for main memory - Static RAM - Faster - Used for cache memory (both on and off chip) ### Types ROM - Written during manufacture - Example: ROM - Very expensive for small runs - Programmable (once) - e.g. PROM - Needs special equipment to program - Read "mostly" - e.g. Erasable Programmable (EPROM) - Erased by UV - e.g. Electrically Erasable (EPROM) - Takes much longer to write than read - e.g. Flash memory - Erase whole memory electrically #### Read only memory (ROM) - Contains a permanent pattern of data that cannot be changed or added to - No power source is required to maintain the bit values in memory - Data or program is permanently in main memory and never needs to be loaded from a secondary storage deice - Data is actually wired into the chip as part of the fabrication process - Disadvantages: - No room for error, if one bit is wrong the whole batch of ROMs must be thrown out - Data insertion step includes a relatively large fixed cost #### PROM - Programmable ROM - Less expensive alternative - Nonvolatile and may be written into only once - Writing process is performed electrically and may be performed by supplier or customer at a time later than the original chip fabrication - Special equipment is required for the writing process - Provides flexibility and convenience - Attractive for high volume production runs #### EPROM - Erasable programmable read-only memory - Erasure process can be performed repeatedly - More expensive than PROM but it has the advantage of the multiple update capability #### EEPROM - Electrically Erasable Programmable Read-only Memory - Can be written into any time without erasing prior contents - Combine the advantage of non-volatility with the flexibility of being updatable in place - More expensive than EPROM #### Flash memory - Intermediate between EPROM and EEPROM in both cost and functionality - Uses an electrical erasing technology, does not provide byte-lecel erasure - Microchip is organized so that a section of memory cells are erased in a single action or "flash" # Chapter 4B (2): Memory Error - A semiconductor memory system is subject to errors: - hard failure - soft error ## Hard Failure - Permanent physical defect - Memory cell or cells affected cannot reliably store data but become stuck at 0 or 1 or switch erratically between 0 and 1 - Can be caused by: - Harsh environmental abuse - Manufacturing defeects - Wear ## Soft Error - Random, non-destructive event that alters the contents of one or more memory cells - No permanent damage to memory - Can be caused by: - Power supply problems - Alpha particles ## Error correction - Most modern main memory systems include logic for both detecting and correcting errors ![](https://i.imgur.com/UDcB4My.png) - When data are to written into memory, a function (f) is performed on the data to produce a code - If an M-bit word of data is to be stored and the code is of length K bits, then actual size of stored word is M+K bits - When the previously stored word is read out, the code is used to detect and possibly correct errors - A new set of K code bits is generated from the M data bits and compared with the fetched code bits The comparison yields one of three results: - No errors are detected, the fetched data bits are sent out - An error is detected, and it is possible to correct the error. The data bits plus error correction bits are fed into a corrector, which produces a corrected set of M bits to be sent out - An error is detected, but it is not possible to correct it. This condition is reported ### Hamming Code #### Step 1: - Determine the length of code bits, Z - Z = M + K - M = length of data bits - K = length of check bits - $2^K-1\ge M+K$ Example - M = 1001 (4-bit) - if K = 2, $2^K-1\ge 4+2$ :x: - if K = 3, $2^3-1\ge 4+3$ :heavy_check_mark: #### Step 2: - Arrange data bits and check bits according to their bit positions - **check bit** are to be placed at bit positions numbers are powers of 2 - **data bits** to be placed at remaining bit positions Example |Bit position|7|6|5|4|3|2|1| |:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| |Position number|111|110|101|100|011|010|001| |Data bit|D4|D3|D2||D1||| |Check bit||||C3||C2|C1| #### Step 3: - Calculate check bits - $C1=D1\oplus D2\oplus D4$ - $C2=D1\oplus D3\oplus D4$ - $C3=D2\oplus D3\oplus D4$ #### Step 4: |Bit position|7|6|5|4|3|2|1| |:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| |Position number|111|110|101|100|011|010|001| |Data bit|D4|D3|D2||D1||| |Check bit||||C3||C2|C1| |Code bit, Z|1|0|0|1|1|0|0|