程式語言 Programming Language (CSIE-2A) (Ch.1, Ch.5)

# 程式語言 Programming Language (CSIE-2A) (Ch.1, Ch.5) :::success :::spoiler click to open TOC [TOC] ::: ###### tags: `courses meow` `CSIE-2A` `Programming Language` `2021 Spring` :::info ### :link: Links - [資訊網頁link](https://staff.csie.ncu.edu.tw/hsufh/COURSES/SPRING2021/ppl.html) - [線上發問系統(臨時)](https://tlk.io/ncu_ppl2021) - [購書連結](https://youtu.be/072tU1tamd0) ### :book: Books - Concepts of programming language (Robert W. Sebesta) ### :clock1: Office hour - 週一下午 16:00 ~ 17:00 - 週三下午 16:00 ~ 17:00 ::: :::success ### :information_source: Info 上課16~18周左右 (可能變成 16 week) ### 分數計算 > 基本上不點名 (只要前兩排坐滿) ::: --- # Chapter 1 > 很多OS 的 kernel 是 C/C++ 寫的所以 C/C++ 一直還是很前面 - [toibe，一個統計language熱門度的網站](https://www.tiobe.com/tiobe-index/) > 不同的 language 對於某些 statment 的解釋不同，所以如果不了解背後的機制，很容易導致效率和安全問題 > 這堂課會稍微提到一些關於高階語言轉換到機器碼的過程 > 我們會跳過ch3, 有時間再來講 > 之後會有很多的圖、動畫、甚至直接coding, compile 看 machine code :::warning #### Side Effect Example ``` result = a + fun(a) ``` 基於 compiler 的不同，結果可能不同(如果fun會改變a，運算順序就會影響) ``` (a > b) || (b++ / 3) ``` b 只會在 a<=b 的時候加1 :::  ## Reason for studying concepts of programming Language > 寫程式就是把「想法」用電腦的語言表達出來 - Increased ability to express idea - 模擬其他語言的 feature - Increased ability to learn new language - Improve background for choosing language - Better understanding of significance of implementation - Program bugs - > 有些 bug 是因為程式在轉為 assembly 時，跟預期不一樣 - Performance ## Programming Domains - Scientific applications - Large floating number computations - Fortran (IBM, 1950s) - Business applications - COBOL (1960) - Artificial intelligence - **Symbols**, **consisting of names** rather than numbers, are manipulated - LISP (1959) - System programming - Fast execution - Low-level features - C - The `Unix` OS is written almost entirely in c - > 我很不欣賞各大公司用 typedef 把名稱改成自己的名稱。導致學習成本的增加 [name=教授] > level 代表離硬體多近 - Web software - Markup (XHTML) - Scripting (PHP) - General-purpose (Java) ## Language Evaluation Criteria - **Readability**: (好不好讀) - Maintenance - **Writability**: (好不好寫) - **Reliability**: (可靠度) - **Cost**: (成本) ### Characteristics contributing to readability - Overall Simplicity - A manageable set of features and constructs - Few **feature multiplicity** (methods of doing the same operation) ```java= count = count + 1 count += 1 count++ ++count ``` - Minimal **operator overloading** (a single operator symbol has more than one meaning) > 適度的簡化可以幫助可讀性，但是過度的簡化可能反而降低可讀性 - Control Statements - well-known control statements (e.g. `while` statement) - Data Types and Structures - some language does not contain `Boolean`, which can be replaced by `Int` - Syntax Considertations - Identifier forms - short variable name will cause readability issues - Special words - `if`, `while`, ... - Methods of forming compound statements - C, Java: curly braces - Pascal: `begin`, `end` - Python: tab ### Evaluation Criteria: Writability > Readability 會影響 Writability - Support for Abstraction - Process (subprogram) - Data - Data Abstraction - Data types: `int`, `long` ... - `struct` / `class` - Lookup table - Hash table - Binary search tree - Linear list of key-value pairs - Expressivity - `for` is easier to code than `while` ### Characteristics contributing to reliability - Type Checking ```c= void is_greater(unsigned int a, int b) { if (a < b) { printf("a < b\n"); } else { printf("a >= b\n"); } } int main() { is_greater(1, 2); // a < b is_greater(-1, 2); // a >= b } ``` :::spoiler meme ![](https://i.imgur.com/63Mp56A.png) ::: - Exception handling - `try`, `catch` ### Damaging the Reliability - Aliasing - 多個變數共用記憶體區塊 (pointer 之類的) - 可能導致變數內容非預期的被修改 ### Evaluation Criteria: Cost - Training programmers to use language - Writing programs - Compiling - Executing - Language implementation system: availability of free compilers ### Evaluation Criteria: Others - Portability - Generality - Well-definedness ## Influences on Language Design - Computer Architecture - know as 'Von Neumann Architecture' - Programming Methodologies(策略) - e.g. Object-oriented ## Von Neumann Architecture > 硬碟 <-> 主記憶體 <-> CPU_cache <-> CPU - Imperative language > Designed based on Von Neumann Architecture - Data and programs are stored in the same memory - Memory is separate from CPU - Instructions and data are transmitted from memory to CPU - Results of operations in the CPU must be moved back to memory :::info ### Example time > compiler 就像是一個建築師，規劃了你的客廳、廚房的位置，到時候要蓋的時候要蓋在台北還是紐約都可以 [name=教授] > (程式內的 data segment, text segment 等相對位置以確定在 machine code 中，在實際 load 進 memory 時 OS 會再分配她到某位置) :::spoiler Some Dummy code ```c= // example.c int add_two_item(int a, int b) { return a + b; } int main() { int x = 10; int y = 20; printf("%d\n", add_two_item(x, y)); return 0; } ``` ```shell # use objdump to observe the machine code $ gcc -o a.out ./example.c $ objdump -D ./a.out | less ``` ```asm 0000000000001149 <add_two_item>: 1149: f3 0f 1e fa endbr64 114d: 55 push %rbp 114e: 48 89 e5 mov %rsp,%rbp 1151: 89 7d fc mov %edi,-0x4(%rbp) 1154: 89 75 f8 mov %esi,-0x8(%rbp) 1157: 8b 55 fc mov -0x4(%rbp),%edx 115a: 8b 45 f8 mov -0x8(%rbp),%eax 115d: 01 d0 add %edx,%eax 115f: 5d pop %rbp 1160: c3 retq 0000000000001161 <main>: 1161: f3 0f 1e fa endbr64 1165: 55 push %rbp 1166: 48 89 e5 mov %rsp,%rbp 1169: 48 83 ec 10 sub $0x10,%rsp 116d: c7 45 f8 0a 00 00 00 movl $0xa,-0x8(%rbp) 1174: c7 45 fc 14 00 00 00 movl $0x14,-0x4(%rbp) 117b: 8b 55 fc mov -0x4(%rbp),%edx 117e: 8b 45 f8 mov -0x8(%rbp),%eax 1181: 89 d6 mov %edx,%esi 1183: 89 c7 mov %eax,%edi 1185: e8 bf ff ff ff callq 1149 <add_two_item> 118a: 89 c6 mov %eax,%esi 118c: 48 8d 3d 71 0e 00 00 lea 0xe71(%rip),%rdi # 2004 <_IO_stdin_used+0x4> 1193: b8 00 00 00 00 mov $0x0,%eax 1198: e8 b3 fe ff ff callq 1050 <printf@plt> 119d: b8 00 00 00 00 mov $0x0,%eax 11a2: c9 leaveq 11a3: c3 retq 11a4: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 11ab: 00 00 00 11ae: 66 90 xchg %ax,%ax ``` :::  logical address physical address > 我們通常都會去看 logical address，因為 physical address 會長什麼樣我們不太能決定，會受到 OS 和硬體的影響 ## Fetch-execute-cycle ``` initialize the program counter repeat forever fetch the instruction pointed by the counter increment the counter decode the instruction execute the instruction end repeat ``` ## Evolution of Programming Methodologies - 1950s - Simple application - 1970s - Hardware cost decreased - Emphasis: - Structured programming - Deficiency - Incompleteness of type checking - Late 1970s: - From "procedure-oriented" to "data-oriented" - 出現支援 data abstraction 的語言 - Middle 1980s: OOP - Data Abstraction - Encapsulates processing with data objects - Inheritance - Dynamic Method Binding - Overloaded method ## Programming Language Categories - Imperative - `C`, `Pascal` - Functional - `LISP`, `Scheme` - Logic - `Prolog` - Object-oriented - `Java`, `C++` - Visual languages - `VB`, `VB.NET`, `labXview`,`5<尺4tㄈ|-|`  - Scripting languages - `Perl`, `Javascript`, and `Ruby` ## Benefits of Modular Design > 想練肌肉就去搬個映像管螢幕吧，你會變得很壯，肚子也會變小[name=許富皓教授] > ~~如果我當初有申請專利的話我現在就不在這裡了~~ ## Language Design Trade-Offs > e.g. C 語言不會檢查 array 是否超出範圍， > 而 java 會，但是因為做了檢查而導致效率變慢 :::spoiler some example code ```c= // test2.c #include <stdio.h> int main() { int peko[10]; peko[0] = 1; peko[10] = 2; return 0; } ``` ```shell= $ gcc -o test2.out ./test2.c && ./test2.out *** stack smashing detected ***: terminated [1] 394745 abort (core dumped) ./test2.out ``` ::: - Reliability vs. cost of execution - Readability vs. writability - Writability(flexibility) vs. reliability ## Language Implementation - Compilation - Program are translated to machine code - translating high-level program to machine code - source code -> executable - Slow translation, fast execution :::info - lexical analysis: - 把`int i=1`拆成一個個token: `int`, `i`, `=`, `1` - syntax analysis: - 把轉出來的token變成parse tree - intermediate code (中介碼) generation - semantics analysis - code generation ::: - Pure Interpretation - Hybrid Implemantation Systems - Java: `.java` file can be compiled to `.class` file (java bytecode) and run on JVM ### Optimization vs. Reliability > Shared Memory: 有時兩個 Process 會共用記憶體 ```c= // 原程式 int a; int foo() { a = 1; if (a > 0) a = 3; else a = -1; return a; } ``` ```c= // 經過優化可變成 int foo() { a = 3; return a; } ``` - 問題： - 若此程式與另一個 Process shared memory - 則 a 可能被另一個 process 改變 > shared memory 所引起的 Bug 非常難找 ## Symbol Table - symbol table serves as a database for the compilation process. - the **type** and **attribute** information of each user-defined name in the program > 變數名稱等東東會存在這裡，給後續的 code generator ### Compilation Process 一個很複雜的圖在這裡 ### Linking Operation - Linking: The operation of collecting system functions and linking them to user programs ![](https://i.imgur.com/cHqmswX.png) :::info `strings` ::: ## Interpreter - The interpreter program acts as a ***software simulation*** of a machine whose fetch-execute cycle deals with high-level language program statements rather than machine instructions. - This software simulation obviously provides a **virtual machine** for the language. <img width="300" src="https://i.imgur.com/pfKjiWh.png"> ### Advantage - debug 時，error message 可以直接對應到 source code ### Disadvantage - ==**SLOW**== (10 to 100 times slower than compiled programs) - Often requires more space - 執行時需要 source code - 所以為了減少空間有時會先將 source code 轉換形式 (hybrid implementation, e.g. Java) ### Popularity of interpretation - 因為網路與 app 的發展而再次偉大 ## Preprocessors - example: - C 的 preprocessor # Chapter 5 > 一個東西我應該會講到三次 > 第一次的時候，你應該什麼都聽不懂，但至少你知道這個名詞了 > 第二次的時候，你會：歐歐，這個名詞我聽過，那他是什麼呢 > 第三次的時後，你就開始理解他是什麼了 ### C-based Language - 像是 C, C++, C#, Java ## Memory Cells and Variable ### Attributes of Variables - can be characterized by a collection of **properties**, or **attributes** > class 就是自定義的 type #### Name - 基本上就是一個字串 - 可以代表： - labels - subprograms - formal parameters - other program constructs - 有時在「程式語言」裡可以稱為`identifier` - Design Issues - Are name case sensitive? - Length of Names - Examples: - FORTRAN I: max: 6 - C 89: - no length limitation - only the first 31 are significant - external names - are restricted to 6 character - C++++(C#) and java: no limit - C++: no limit, but implementers often impose one - 不同的 compiler 會有自己不同的名稱長度問題 - Name Forms - C style - e.g. `my_stack`, `max_val` - Camel style - e.g. 小駝峰 `myStack`, `maxValue` - e.g. 大駝峰 `MyStack`, `MaxValue` - Fortran 90 - 允許空格 - Case Sensitivity - Detrimental to readability - Case sensitivity **violates the design principle**: constructs that look the same should have the same meaning. - Detrimental to writability - The need to remember specific case usage (要記得大小寫是有不同的用途) - Special words - e.g. `if`, `else`, `for`, `while`, ... - to separate syntactic entities of programs - 在 Fortran 裡也可以使用 special words 來宣告變數名，compiler 會根據上下文來判斷 - Reserved words - 不能被當作 variable name 來使用 | keywords | reserved words | | -------- | -------- | | ==可以==當作變數名使用 | ==無法==當作變數名使用 | #### Variables - 一些 Attribute - Name - Address - Value - Scope #### Address - Memory address asssociated with the variable - May be different for the ==same variable name== (local variable) - The address you see inside a program is the ==logical adress== - The caller is above the callee inside the stack (top to down) - stack 是由上往下長的 (address會越來越小) - 在同個 stack frame 裡面卻是由下往上看 > 複習 Assembly~ ### L-value and R-value - L-value: The address of a variable (在 assignment statement 中是位在左邊) - R-value: The value of a variable - To access r-value, l-value must be determined first (not easy). ### Type - The type of a variable determines the - the **range of values** - the **set of operation** ### Value ??? ### Abstract Cell - A cell the size of the variable - e.g., an integer (cell) is 4 bytes ### Binding > A binding is an association - Can take place at - language design time - link time - compile time - run time #### Static binding - it first occurs before run time - remains unchanged throughout the program exec - e.g., Global variables - > 程式在 runtime 前定義好的 #### Dynamic binding - first occurs during run time - can change during program execution - e.g., `` - > 程式在 runtime 的時候才定義的 #### Type binding - A variable must be bound to a ==data type== first - static specified - explicit declaration - `int i = 0` - implicit declaration - `i = 0` - bad reliability - > static binding ### Definition and Declaration - Declaration 宣告 (對function來說就是宣告prototype) ```=c extern int i; // no memory allocation extern void f(); int bar(int, int); ``` - Definition 定義 ```=c int i; // allocates memory void f() { printf("Hello world"); } int bar(int a, int b) { return s + b; } ``` ### Dynamic Binding - Languages: Python, Javascript, PHP - Less reliable in error detection - Dynamic type binding allows any variable to be assigned a value of any type. - Incorrect types of right sides of assignments are not detected as errors; rather the type of the left side is simply changed to the incorrect type. - Languages that have dynamic type binding for variables are usually implemented using **pure interpreters** rather than **compilers**. > 允許任何的變數以任何的型態賦予 > 因此在增加彈性時，同時也會增加 debug 難度 #### Disadvantage of Dynamic binding - cost ### 窩不知道 ## Category of Scalar Variable - 1. static - 2. stack-dynamic - 3. explicit heap-dynamic - 4. implicit heap-dynamic > 通常一種 language 不會有所有的種類 ### 1. Static - 在一開始就 bind #### Application of static Variables - Globally accessible variable ```c= int bar() { // can noly be used inside bar // but foo_static has a specific address static int foo_static; // local scope, static variable } ``` ### 2. Stack-dynamic > Elaboration - elaborate 在 run-time - stack-dynamic variable 在 elaborate 後才會被配置記憶體位置 - stack-dynamic 的記憶體配置在 stack #### Disadvantage of stack-dynamic - ==TODO== - slower - 由於取值時需要以 `stack base pointer` 的相對位置來取值 - 因此多一個 `ebp` +- 的運算 - > 不過也只是慢一點點 ### 3. Explicit Heap-dynamic - nameless memory cells - 因此需要由其他 variable 透過 pointer/referencing 來 access - 存在 `Heap` 中 - Allocated / Deallocated in run-time - 由 programmer 自行配置 - e.g. ```c= int *arr = malloc(sizeof(int) * 10); arr[9] = 1; free(arr); ``` ```java= Class Circle {...}; Circle cir = new Circle(); //cir 的 datatype 不是 Circle，他只是一個 reference variable // java Garbage Collection // java 不需要自行 free，當沒有人指向此 memory 空間時(reference counter)，interpreter 自行 free 掉此記憶體 ``` ==TODO:== 加一個圖表 ```graphviz ``` ### Lifetime of Variable - 一個 Variable 的 lifetime 跟他的 scope 沒什麼關係 - 動態配置的記憶體需要透過 ebp +- 某一個數去存取 ### Creating a Java Object - Declaration ![](https://i.imgur.com/Fborlgf.png) - Instantiation and initialization - Instantiation: 配置記憶體區塊 - Initialization: 記憶體裡面的資料(data member)會被初始化 ![](https://i.imgur.com/cTQrSNt.png) ```java= Box mybox = new Box(); ``` ### Disadvantage of Explicit Heap-dynamic Variables - the **difficulty** of using pointer variable correctly - Cost - reference to variables - allocate memory - deallocation - the complexity of storage implement ### Implicit Heap-dynamic Variables > 通常都是 interpreter 語言會才會提供這種功能 - Implicit heap-dynamic variables are bound to heap storage only when they are assigned values (usually in dynamic type binding languages) - e.g. Javascript - `list = [1.1, 2.2]` - Advantages - Have the highest degree of **flexibility** - Disadvantages - Runtime overhead - Loss of error detection ### Type checking - The activity of ensuring that the operands of an operator are of compatible types - e.g. If an `int` variable and a `float` variable are added in Java, the value of the `int` variable is coerced to `float` and a ***floating-point add*** is done. #### Example ```java= // Java int a = 1; float b = 1.1; a + b; // int + float (不存在 int + float 的 operator 定義) // 將 int 轉換成 float // float + float (完成運算) ``` 有些 language 不會做自動型態轉換，會在 compile-time 給你 type error。譬如: `Go` ### Type error - An operator to an operand of an inappropriate type - e.g. In the original version of C, if an `int` value was passed to a function that expected a `float` value, a type error would occur. #### Integer Signed Attacks ```c= void &memcpy(void *dest, void *src, size_t n); // size_t 為 unsined-int // 一個間單的功能，可以把 buf 內容 copy 到 data[], 並確認 len 大小不超過 buf // ! 但是，如果我們今天 len 為 -1.... // store_data(buf, -1); static char data[256]; void *store_data(char *buf, int len) { if (len < 256) // (-1 < 256) true return NULL; else return memcpy(data, buf, len); // 由於 len 為 unsigned，因此 len 會被解釋成 2147483647， // 遠超過 data[] 的大小，造成 buffer overflow } ``` #### Static type checking - **not** all _type error_ can be found in _static type checking_ ```c= // example union sign { int first; char second; } foo; foo.first = 12; // int = int (OK!) char c = foo.second; // char = char (OK!) // But, foo.first and foo.second shared memory. // So, after all, a part of integer "12" is assigned to the variable c // It somehow should be considered as a type error... // However, it's complete OK in C language ;) ```  ## Type Equivalence > 我覺得大部分的語言在這兩者的使用上有點混亂，這本教科書也解釋的不是很清楚 [name=教授] - Name type equivalence - example: - ``` type celsius = Float; farenheit = Float; ``` - `celsius` 和 `farenheit` 會被視為不同 type，不能直接互相相加 - Structure type equivalence ### Derived Types - Based on some *previously defined type* - Derived types inherit all the properties of their **parent types** - e.g. - `type celsius is new FLOAT` - `type fahrenheit is new FLOAT` - Although the strucutres of the 2 derived types are identical, they are not equivalent. - Also, they are not type equivalent with any other `FLOAT` type. ### Literals - Literals are exempt from the rule imposed on derived types. - 常數是通用的，不會有 type 不相容的問題 ### Subtypes - A **subtype** is a possibly *range-constrained* version of an existing type - ==A subtype is type equivalent with its parent type== - e.g. - `subtype Small_type is Integer range 0..99` ## Scope - *range of statements* in which the variable is visible ### Scope Rules - 變數的名稱什麼時候會可以重新利用 - scope 也可以指 subprogram 或 block - > 在 C 中在 `if`, `while`, `for` 的 `{}` 內的區域算是一種 block ### Local Variables - A variable is **local** if it is declared within a *program unit* or *block* - The **nonlocal variables** of a program unit or block are those that are visible to the program or block, but not declared within. - Nonlocal variables $\neq$ Global variables - e.g. `static` keyword only visible to current file - ```c= /*======== file_a.c ========*/ static int g = 1; int foo() { /* g = 1*/ } int main() { /* g = 1 */ } /*======== file_b.c ========*/ // can not use g in file_a.c int startbrust() { /* g undefined */ } /*======== file_c.c ========*/ int g = 2; int bar() { /* g = 2 */ } ``` ### Static Scoping - 變數的範圍在執行前就已知 ### Find the Declaration Statement of a Variable - Suppose a reference is made to a variable `x` in subprogram `Sub1` - 由下層(`Sub1`)往上層(***static parent*** of `Sub1`)找，都沒找到就會丟 undeclared variable error ### Hide Variable Declarations - 變數的遮蔽效應：在小範圍內 redefine 的 local variable 會遮蔽掉大範圍內的 local variable - ```c= void sub() { int count; // hidden while (true) { int count; count++; } } ``` ### Selective Reference - In **Ada**, hidden variables from ancestor scopes can be accessed with selective references, which include the ancestor scope's name. - e.g. - In the preceding procedure, `Big`, the `X` declared in `Big` can be accessed in `Sub1` by the reference `Big.X`. - In **C++**, global variables can be accessed by using the **scope operator** `::`. - e.g. - If `x` is a global that is hidden in a subprogram by a local named `x`, the global could be referenced as `::x`. > 間單就是美 [name=教授] ### Drawback of Static Scoping ``` main { A { C { } D { } } B { E { } } } ``` 在上面的狀況下，`E` 無法呼叫 `D`，而若移動 `E` 或 `D` 的位置可能導致他們又無法存取本來他們要存取的 variable ### Dynamic Scoping - Dynamic scoping is based on calling sequence of subprograms, thus the scope **can only be determined at runtime** - NOT widely used by modern programming languages ### Scope vs. Lifetime - Scope: 空間 (區域) - Lifetime: 時間 - e.g. - ```c= void foo() { // ... } /* end of foo()*/ void bar() { int sum; // ... foo(); } /* end of bar() */ ``` The scope of variable `sum` is in function `bar()`, but the lifetime of the variable extends during the execution of `foo()`. ### Referencing Environment - local 變數和 ancestor scope 裡 visible 的變數 - 也就是所有可用變數的集合 #### Referencing Environment Examples ``` void sub1() { int a, b; //.... 1 } void sub2() { int b, c; //..... 2 sub1(); } void main() { int c, d; //..... 3 sub2(); } ``` ### Named Constants - A named constant is a variable that is bound to a value only once - Increase readability, reliability - e.g. `PI = 3.14159` #### Dynamic bindings of values ```cpp const int result = 2 * width + 1; ``` This statement declares `result` to be an `int` type named constant. The value of variable `width` must be visible when `result` is allocated and bound to the value. ### Initialization > 只有 definition 時同時做的才叫做 initialization > definition 後做 assign 不算是有些 Loader 會在把程式載入 memory 裡時，把 BSS 段等清成零。但是這不是一個“強制規定”。因此，請不要過度依賴這個功能 ==這篇也要拆兩半了吧，開始有點卡了== 拆起來  <!-- TODO: mini-server 裡，player 斷線時要把它踢出房間 - create message-> ---- ## Q&A // ==TODO:== add a Q&A page - Q: 那作業系統所說的64/32bit 是指說指令最多支援到64/32bit 還是是指說一個word的長度？ - A: 一個 word 是多大不同的 asm 不一樣，32/64 的差別是在 bus 的寬度(也和指令的長度沒有關係) <br> - Q: 請問Virtual address 跟 logical address是一樣的嗎？ - A: virtual address = offset (default stored in `ds` register) + logical address logical address => segmentation unit => linear address (virtual address) => paging unit => physical address // ==TODO==: make a diagram > 你不會在程式裡或是在 memory 上看到 virtual address <br> - Q: - A: