--- tags: uni --- # SysProg Genral Exam Information SysProg: Exam **starts at 8:30** Answers in English or German. Not interested in memory for small details, not intrested in ability to perform complex arithmetics. Techniques, ideas, concepts from lectures and assignments. X86 Cheatsheet is provided. Take a look at your assignment exercises. Any detailed data will be supplied in the exam. More exam questions and shorter makes it easier to corret. Exam tips: - Read the instructions. - Then read the whole exam paper through - Look at the number of points for each question - This shows how long we think it will take to answer - Find a question you know you can answer and andswer it. - This will make you feel better early on. - Watch the clock - If you are taking too lonng on a question, consider dropping it and moving on to another one. - **Always show your working.** ~~01 Introduction~~ ~~02 Introduction to C~~ ~~03 Integers in C~~ 04 Pointers 05 Dynamic memory allocation 06 Wrapping up C 07 Implementing dynamic memory allocation ~~08 Basic x86 architecture~~ ~~09 Compiling C control flow~~ ~~10 Compiling C data structures~~ ~~11 Linking~~ ~~12 Code vulnerabilities~~ ~~13 Floating Point~~ ~~14 Optimizing Compilers~~ ~~15 Architecture and Optimization~~ ~~16 Caches~~ ~~17 Exceptions~~ ~~18 Virtual Memory~~ 19 Multicore 20 Devices 21 Summary ## General Information https://www.systems.ethz.ch/courses/fall2019/SPCA Tue: HG E7 Wed: NO C 60 Tutorial Session Wednesday 13:00 - 15:00 Encompasses OS, DB, Network protocols and routing, compilers, distributed systems, cloud, big data. Course is what is **on and above the hardware/software** boundary. To get a feeling about all layers of a computer. Textbook is: Randal E. Bryant and David R. Computer Systems: A programmers perspective, thrid edition. and The c programming language second edition. ## Introduction C ### History Developed in early 1970. CPL -> BCPL -> B -> C. Early days: Standard was the compiler source (want to know what a function does?). C is **very fast** (because easy to compile to machine code, is a simple language, people who write compilers invest alot of effort into the compiler since you want to show of your chip), java 50% performance of C, powerful macro **pre-processor (cpp)**, is close to hardware. Designed to write an operating system on. Stuff you don't get in C. **No objects, classes, features, methods or interfaces**, **no fancy built in types** (e.g. no string) but type constructor. No exceptions (convention to return integer codes). Most important difference: **does not have automatic memory management**. Has **pointers**. C is about directly building and manipulating structures in memory. gcc is not compiler but runs a bunch of programs. cpp (macro substituion) , cc1 (compile each c file into assembly), as (assemble each file into object code), Id (link). ### C macro pre processor ?! C is run trough a macro preprocessor. Goes trough code before compilation and substitutes string and files. Does conditional compilation. ### Control flow in C in for loop, if test is false it exits immediately. Do always executes the statement, while does not (not sure check!!). continue stops and starts on the top again. break breaks out only of the current loop. ### Basic Types in C static means it has the same scope but when the block is left it will keep the value. They persist between different calls. Outside a block, the socpe is the entry program. Static defined in a file hides it from other files. In C anything can be casted into any type. ### Operators | Operator | Associativity | what it do | | -------- | -------- | --- | | () [] -> . | Left-to-right|who knows lol |! ~ ++ -- + - * & (type) sizeof|Right-to-left| |* / %| Left-to-right| |+ - |Left-to-right| |<< >> |Left-to-right| |< <= > >= |Left-to-right| |== != |Left-to-right| |& (bit and)|Left-to-right| |^ (bit **x**or)|Left-to-right| |\| (bit or)|Left-to-right| |&& |Left-to-right| |\|\| |Left-to-right| |?: |Right-to-left| |= += -= *= /= %= &= ^= \|= <<= >>= |Right-to-left| |, |Left-to-right| () means struct pointer indirection. i++ = post increment, but afterwards it is +1 ++i = pre increment, the value is already increased with +1 ### Arrays in C Compiler does not check array bounds. String always end in a \0 (zero byte). ## General (2) ### What to do next: Learn and write about: pipelines, MIPS or LC-3b assembly (registers, addressing modes, instruction formats), concurrency and parallelism (locks, mutexes, condition variables, parallel programming constructs) ?! ### Self digression - Memory Mountain Two dimensional function of read bandwidth versus temporal and spatial locality. Characterizes the memory capabilites of each computer. ### Self digression - MIPS Two examples of the von Neumann model LC-3 and MIPS. Instruction **Pipeline** is a technique for implementing instruction-level parallelism within a single processor. ### Requirements refurbishment **big-endian**, biggest value at the very left (second/minute/hour) **strongly typed** language has stricter typing rules -> errors and exceptions are more likely to happen during compilation **Design-by-contract**, Preconditions, Postconditions, Invariants (assumptions that apply across classes) ## Representing integers in C ~ means bitwise not, flips 010001 to 101110 & means AND, | means OR bitwise ^ XOR && logical AND (is interpreted as boolean value) || logical OR (is interpreted as boolean value) ! logical not operator on shift Logic shift fills with 0 on left, in arithmetic shift (if INT) replicate most significant bit on right. Undefined behavior, dont do shift amount < 0 or >= word size. Zweierkomplement, signed means 1 in front and that the twos complement will work. Signed values implicitly cast to unsigned when mixed. Closed under addition, commutative (u,v)=(v,u) bitwise addition is signed and unsigned exactly the same ## Integer multiplication in C Ranges, mostly become twice as big. Rounds wrong direction when u < 0. When -7606.5 gets shiftet it will become -7607. Signed dividion, greater than 0 is nice, smaller than 0 we have to add something (missed what) Do Integer C Puzzle. Show why it is always true or find counter example. ## Pointers OS Loads a program it creastes an address space, inspects the executable files, copies regions of the files &x returns the value of the adress where x is stored. %p is used in print to show it. ## Implementing dynamic memory allocation This is how malloc actually works. Malloc returns a pointer to a xy bytes big memory block. Explicit memory allocation, application allocates and frees spaces. Implicit, application allocates but does not free. Freeing is done by garbage collector. Can't control number or size of allocated blocks. Must respond immediately to malloc() requests. It can't reorder anything or buffer anything. The blocks can't be moved, they are stuck. Malloc and freeing reduces throughput. Aggregated payload: Everything that has been allocated minus everything htat has been freed. How to keep track of free blocks: Internal vs external fragmentation. Means you need to allocate more memory than acutally requested memory. Internal fragmentation if payload < blocksize. Only depends on the pattern of previous pattern. External fragmentation, means there is enough memory in the memory but it is in the wrong place. ## Malloc - **Handling arbitrary request sequences.** The allocator cannot make any assumptions about the ordering of allocate and free requests. - Making immediate responses to requests. The allocator is not allowed to reorder or buffer requests in order to improve performance. - Using only the heap. In order for the allocator to be scalable, any non-scalar data structures used by the allocator must be stored in the heap itself. - Aligning blocks (alignment requirement). The allocator must align blocks in such a way that they can hold any type of data object. On most systems, this means that the block returned by the allocator is aligned on an 8-byte (doubleword) boundary. - Not modifying allocated blocks. Allocators can only manipulate or change free blocks. In particular, **they are not allowed to modify or move blocks once they are allocated.** Thus, **techniques such as compaction of allocated blocks are not permitted.** ## Basic x86 **Architecture: ISA (instruction set architecture)**, is needed to understand assembly, instruction set specification, registers **Microarchitecture:** Implementation of the architecture. cache sizes and core frequency CISC: Complex Instruction Set, •Use stack to pass arguments, save program counter •Explicit push and pop instructions. **Condition codes**•Set as side effect of arithmetic and logical instructions. RISC: Reduced Instruction Set, No condition codes An **Object** file is the compiled file itself. There is no difference between the two. An executable file is formed by linking the Object files. Object file contains low level instructions which can be understood by the CPU. That is why it is also called machine code.A linker takes all these object files and combines them to form one executable. **Linker** Resolves references between files, Combines with static run-time libraries ## Condition codes CF set if carry out from most significant bit (unsigned overflow) ZF set if t == 0 SF set if t < 0 (as signed) OF set if two’s complement (signed) overflow (a>0 && b>0 && t<0) || (a<0 && b<0 && t>=0) (wenn beidi parameter positiv/beid negativ sind, und s resultat umgekehrt) the carry flag is set when there is a carry out of the most-significant bit. The overflow flag is set when there is a carry into the most significant bit. ## Stack Grows towards lower adresses Register **%rsp** contains lowest stack address (top element) ## Linkers Useful because they save time (no need to recompile everything) and space (executable files and running memory images contain only the code for the function they are actually using). Program reference and define **Symbols** (variables and functions). Linker associates each symbol reference with exactly one symbol definition. Object files types: **Relocatable object file (.o file)** **Executable object file** **Shared object file (.so file)** (on windows DLLs) relocatable object file that can be loaded into memory and linked dynamically. Program symbols: • **Strong:** procedures and initialized globals (multiple strong symbols are not allowed, each item can be defined only once. Given a strong symbol and multiple weak symbol, choose the strong symbol) • **Weak:** uninitialized globals **Functions** are strong symbols. **scanf** expects a pointer to the location where the input should be stored **memory leak** when allocated memory is not freed. **Malloc** will return NULL if unsuccessfull, **check return value of malloc!** **.data section** Initialized global variables **.text section** Code **.bss section** Uninitialized global variables ## Code vulnerabilities **Worm**, can run by itself, propagate a fully working version of itself to other computers. **Virus**, adds itself to other programs, cannot run independent. **System-level protections**, - Compiler-inserted checks on functions - Randomized stack offsets (difficult to predictbeginning of inserted code) - Nonexecutable code segments ## Multicore Single physical address space shared by all processors. Communication between processors happens through shared variables in memory. Hardware typically provides cache coherence With several processors, memory can change under a cache 1.**Coherency:** Values in caches all match each other. Processors all see a coherent view of memory Most CPU cores on a modern machine are cache coherent, they behave as if all accessing a single memory array. Complex to implement and hence slower. 2.**Consistency:** The order in which changes to memory are seen by different processors Program order, order in wich a programm issues read/write on a processor. Visibility order, order in which all read and write operations are seen by one or more processors. **Sequential consistency**, Dirty means, data in the cache is different than data in the memory. **MESIF** **Modified:** No other cached copies exist, local copy dirty **Exclusive:** No other cached copies exist, local copy clean **Shared:** Other cached copies exist, all copies are clean At most one other (clean) copy might be in Forward state. **Invalid:** Not in cache. **Forward:** As Shared, but this is the designated responder for requests Always the most recent cache to request line O in AMD is Owner: Multiple dirty copies exist (all consistent). This copy has sole right to modify line. With MOESI there can be cache-cache transfers, so one cache can read from another instead of making a memory access. In MOESI caches can direcly ask for actualized values from each other instead of having to request them every time from main memory. **Advantages of MESI over MSI** In case a processor needs to read a block which none of the other processors have and then write to it, here two bus transactions will take place in the case of MSI. First will be a BusRd request to read the block followed by a BusRdX request before writing to the block. The BusRdX request in this scenario is useless as none of the other caches have the same block, but there is no way for one cache to know about this. Thus, MESI protocol overcomes this limitation by adding an Exclusive state, which results in saving a bus request. This makes a huge difference when a sequential application is running. As only one processor will be working on it, all the accesses will be exclusive. The MSI would have performed very badly here. Even in the case of a highly parallel application where there is minimal sharing of data, MESI would be far faster. **Barriers and Fences**: **Compiler barriers:** prevent compiler reordering statements **Memory barriers:** prevent CPU reordering instructions ## Set_jump / long_jump **setjmp()** saves the current **stack state** in a jump buffer and returns 0. On x86_64, the **stack state constists of** callee saved registers, the stack pointer, the return address of setjmp() and optionally the program counter (must be saved last!). **longjmp()** restores the stack state to the state saved in the jumpbuffer. The function longjmp() never returns, but the setjmp() which saved the jumpbuffer will seem to return the integer valued passed to longjmp(), or 1 if this value was 0. ## Floats Can only exactly represent numbers of the form $x/2^k$. Floating point representation. (-1)^S * M * 2^E sign bit, exp bits (127 + Punktverschiebung) Normalized: Good for bigger values, not equi-spaced, Denormalized: Good for very small values, equi-spaced, and zero -* 1 sign 3 exp 4 fraction |Description|Binary|M|E|Value| |-----------|------|---|---|-----| |Minus Zero|1 000 0000| 0 | -2 |-0.0| |Smallest pos. denorm| 0 000 00001 | 1/16 | -2 | 1/64 | |pos. inf. | 0 111 0000 | |NaN | x 111 xxxx | ## Caches **Placement Policy**, when block not in cache, where fetched block should be located at. **Replacment Policy**, which block gets evicted. Miss rate, 3-10% in L1, <1% L2. Hit Time, time to deliver a line in the cache to the processor, 1-2 cycles for L1, 5-20 cycles for L2. Miss penalty, additional time because of a miss (50 - 200 cycles from main memory). **Types of miss**: Cold miss (first access to block), capacity miss (kein Platz mehr auch wenn perfekte Reihenfolge beim verwerfen), conflict miss (hätte noch Platz gehabt wenn etwas anderes verworfen worden wäre), coherency miss. coherence miss”, caused by a cache line (i.e. block) having been previously evicted by the cache coherency protocol. Virtual Adress VPN | VPO TLBT | TLBI | Offset Tag | Index | Offset PPN | PPO | |CI|CO| Temporal locality vs Spatial locality Recently referenced items are likely to be referenced again and Items with nearby addresses tend to be referenced close together in time - **Write-through** Write immediately to memory, Memory is always consistent with the cache copy, Slow: what if the same value (or line!) is written several times, - **Write-back** Defer write to memory until replacement of line, Need a dirty bit, indicates line is different from memory, Higher performance (but more complex) - **Write allocate** data at the missed-write location is loaded to cache, followed by a write-hit operation. In this approach, write misses are similar to read misses char 1 short 2 int 4 size_t 8 long 8 long long 8 float 4 double 8 **Coherency Miss — ** Miss due to invalid data in cache ### Blocking Splitting a large problems into blocks to reduce the working set of the data to fit into the cache to get up speed. ## SysProg - Assembly Programming There are 16 genral purpose registers in x86. ESP, stack pointer. EBP, base pinter. **Special registers** **EFLAGS**: Contains flags, so we can check the conditions from recent instructions. **Architecture / ISA:** Describe the high level attributes for a system, is needed to write assembly. **Example ISAs:** x86, MIPS, ia64, VAX, Alpha, ARM ... **Microarchitecture:** Implementation of the architecture. This describes more details on fundamental topics like pipelining, instruction level parallelism, inorder vs out of order execution, branch prediction, caches and coherency, prefetch and many more. **RISC** **CISC** Arithmetic instructions can access memory The portion of the stack allocated for a single procedure call is called a stack frame. The callee should not overwrite some register value that the caller planned to use later. By convention, registers %eax, %edx, and %ecx are classified as **caller-save** registers (caller's responsibility to push these registers onto the stack or copy them somewhere else if it wants to restore this value after a procedure call, used to hold temporary quantities). On the other hand, registers %ebx, %esi, and %edi are classified as **callee-save registers**. This means that Q must save the values of any of these registers on the stack before overwriting them, and restore them before returning. registers will hold the same value after the callee returns. ## Exception An exception is a transfer of control to the OS in response to some event. Kernel = OS that runs in kernel mode, think of kernel as a set of trap handling functions). **Synchronous exceptions**, caused by events that occur as a result of executing an instruction. Traps (intentional, system calls, breakpoint traps, special instruction), faults (unintentional but possible recoverable, page-faults, protection faults (unrecoverable), floating point exceptions), aborts (unintentional and unrecoverable, machine check) **Asynchronous exceptions**, Caused by events external to the processor. Indicated by setting the processor’s interrupt pin. I/O interrupts, hitting Ctrl-C at the keyboard, arrival of a packet from a network, arrival of data from a disk, Hard reset interrupt, hitting the reset button, Soft reset interrupt, hitting Ctrl-Alt-Delete on a PC. NMI (non maskable interruption), always executed after current instruction, can not be disabled by processor. Reserved for major hardware faults, parity error, watchdog timer. INTR **Interrupt request**, Can be disabled by IE status flag. CPU acknowledges using INTA pin. Interrupt vector is then supplied to the CPU via the data bus. CPU issues # from vector. **PIC** Programmable Interrupt Controllers, map physical interrupt pins to interrupt vector. Buffer simultaneous interrupts. Prioritize interrupts, ## Virtual Memory **TLB: Translation Lookaside Buffer** - A cache used by the MMU to cache recently used VPN to PPN translations from the page table Benefits: - Efficient use of limited main memory (RAM) - keep only active areas of virtual address space in memory - Simplifies memory management for programmers. (Each process gets the same full, private linear address space) - Isolates address spaces Virtual Memory is an important tool for: 1. Caching disk contents in RAM 2. Performing memory management 3. Protecting memory 4. Simplifying linking and loading Write-through • Write immediately to memory • Memory is always consistent with the cache copy ## setjmp / longjmp The **setjmp** function **saves** the current calling environment in the env buffer, for later use by longjmp, and returns a 0. The **calling environment includes the program counter, stack pointer, and general purpose registers.**