# 程式語言 Programming Language (CSIE-2A) (Ch.6~) ###### tags: `courses meow` `CSIE-2A` `Programming Language` `2021 Spring` :::success :::spoiler click to open TOC [TOC] ::: :::info ### Links - [資訊網頁link](https://staff.csie.ncu.edu.tw/hsufh/COURSES/SPRING2021/ppl.html) - [線上發問系統](https://tlk.io/ncu_ppl2021) ### Material - Concepts of programming language (Robert W. Sebesta) ::: --- # Chapter 6 -- Data Types - **Data type** defines: - a collection of **data value** - a set of **operation** ## Scalar / Structured Variable - Scalar type - int, float, char... - Structured type - array, list, hash, struct... ## Variables and Descriptors - A descriptor is the collection of the attributes of a variable. - e.g. Name, Address, Value, Type, Lifetime, Scope - In an implementation, a descriptor is an area of memory that stores the attributes of a variable. - In all cases, descriptors are used for type checking, and to build the code for the allocation of deallocation operations. ### Static Attributes - Required only at compile time - Built by the compiler, usually a part of the symbol table and are used during compilation ### Dynamic Attributes - Must be maintained during execution - Used by runtime system ## Numeric Type - Many early programming languages had only numeric primitive types. - Numeric types still play a central role among the collections of types supported by contemporary languages. - e.g. - Integer - Floating Point - Decimal - Boolean Types - Character Types - Complex ## Integer Representation - 三種常見的表示方式 - One's complement - Two's complement - Sign magnitude ## Problem of Floating Point Numbers - The loss of accuracy through arithmetic operations - 理應同義的算式,在電腦運算時,會因為 accuracy loss 而導致結果不同 - (a / b) * c - (a * c) / b ## Decimal - Decimal types are stored very much like character strings, using binary codes for the decimal digits. - The operations on decimal values are done in hardware on machines that have such capabilities; otherwise, they are simulated in software. ## Boolean Type - A Boolean value could be represented by a single bit. - But because a single bit of memory is difficult to access efficiently on many machines, they are often stored in the smallest efficiently addressable cell of memory, typically a byte. ## Character Type ### ASCII - The most commonly used coding is ASCII (American Standard Code for Information Interchange), which uses the values 0 to 127 to code 128 different characters. - Many programming languages include a primitive type for character data. ### Unicode - A 16-bit character set named Unicode has been developed as an alternative. - Unicode includes the characters from most of the world's natural languages. - For example, Unicode includes the Cyrillic alphabet, as used in Serbia, and the Thai digits. ## Character String type - 基本上就是 sequence of characters ### Design issues - `a special kind of character array` or `a primitive type` - 用 array 模擬 - e.g. `C`, `C++` - 直接設成原生的一種 type - e.g. `Python`, `Java` - string have `static` or `dynamic` length ### Examples - `C` - use character array to present string - 字串結尾為 `null`, (compiler 會自己將字串常數的尾巴上加 `null`) - 提供 `string.h` lib 做 string operation ```graphviz digraph { label="array" n [label="<p>h|e|l|l|o| |w|o|r|l|d|!|\\0" shape="record"] ptr->n:p } ``` - `Java` - **String pool** - Each time you create a string literal, the JVM checks the string constant pool first. - If the string already exists in the pool, a reference to the pooled instance is returned. - If string doesn't exist in the pool, a new string instance is created and placed in the pool.  ```graphviz digraph{ rankdir=LR; node [shape="record"] n [label="...|<s2> s2|<s1> s1"] subgraph cluster_1 { label="Heap" subgraph cluster_2 { label="string constant pool" w [label=" \"Welcome\" "] } } n:s1 -> w n:s2 -> w } ``` ```java= String s1 = "Welcome"; String s2 = "Welcome"; // will not create new instance ``` ```graphviz digraph { node[shape="record"] subgraph cluster_method_area { label = "Method Area (string constant pool)" data [label="\"abcd\"" ] } subgraph cluster_heap { label = "heap" data1 [label="\"abcd\""] data2 [label="\"abcd\""] } a -> data b -> data c -> data1 d -> data2 } ``` ```java= String a = "abcd"; String b = "abcd"; // use object in string pool String c = new String("abcd"); // create a new object String d = new String("abcd"); ``` - 有趣的結果 (Strings are immutable!) ```java= String s = "test"; String s2 = s; // referenced to string pool s = s.concat(" hi"); System.out.println(s); // "test hi" System.out.println(s2); // "test" ``` ### Pattern Matching > Pattern matching is another fundamental character string operation ! ### Dynamic Length vs Static Length - s - d ## Array Types - 一組同形態的資料 - 表示法 - 同常都是 brackets: `[]` 或 parenthesis: `()` ### Ordinal Type - A **ordinal type** is one in which the range of possible values can be easily associated with the set of positive integers. - e.g. - Java: `int`, `char`, `boolean` - User-defined: `enum`, `subrange` ### Subscript Range Check - Early programming languages did NOT specify that subscript ranges must be implicitly checked. - Range errors in subscripts are common in programs, so requiring ***range checking*** is an important factor in the **reliability** of languages. > 大部分的程式語言並不會檢查索引值超界的錯誤 - C, C++, Fortran 不會檢查 - Java, C#, ML 會有錯誤 ### Array Categories #### Static arrays - The subscript ranges are statically bound - Storage allocation is static (done before run time). - Advantage - **Efficiency**: No dynamic allocation or deallocation is required. > 缺點是記憶體空間無法另作他用 [name=教授] - Example - Arrays declared in **C and C++** functions that include the `static` modifier are examples of static arrays. #### Fixed stack-dynamic arrays - The subscript ranges are statically bound - The allocation is done at **declaration elaboration** time ++during execution++. - Advantage - **Space Efficiency**: A large array in one procedure can use the same space as a large array in a different procedure, as long as both procedures are not active at the same time. - Example - Arrays that are declared in **C and C++** functions (==without== the `static` specifier) are examples of fixed stack-dynamic arrays. #### Stack-dynamic arrays - The subscript ranges are ***dynamically*** bound at elaboration time - The storage allocation is ***dynamically*** bound at elaboration time. - Advantage - **Flexibility**: The size of an array need not be known until the array is about to be used. - Example - **Ada** arrays can be stack-dynamic. #### Fixed heap-dynamic arrays - Same with fixed stack-dynamic array, but the allocation is done in **heap**. - Example - **Java** arrays - **C and C++** also provide fixed heap-dynamic arrays. #### Heap-dynamic arrays - Often provided by **interpreted languages** - Arrays can grow and shrink during program execution as the need for space changes. - Example - **Perl** and **Javascript** ### Heterogeneous Arrays - A **heterogeneous array** is one in which the elements need not be of the same type. - Example - **Perl**, **Python**, **Javascript**, and **Ruby** - In all of these languages, arrays are heap dynamic. (Interpreted languages) ### Jagged Arrays - A **rectangular array** is a multidimensioned array in which ++all of the rows have the same number of elements, all of the columns have the same number of elements++, and so forth. - Rectangular arrays accurately model tables. - In contrast, a **jagged array** is one in which the lengths of the rows need not be the same. ![](https://i.imgur.com/p4Tn0dE.png) ### Associative Arrays - An **associative array** is an unordered collection of data elements that are indexed by an equal number of values called keys. (map-like structure) - In **Perl**, associative arrays are often called ***hashes***, because in the implementation their elements are stored and retrieved with **hash functions**. ```perl= %salaries = ("Gary" => 75000, "Perry" => 57000, "Mary" => 55750, "Cedric" => 47850); ``` ## Pointer ### dangling pointer