# 程式語言 Programming Language (CSIE-2A) (Ch.6~)
###### tags: `courses meow` `CSIE-2A` `Programming Language` `2021 Spring`
:::success
:::spoiler click to open TOC
[TOC]
:::
:::info
### Links
- [資訊網頁link](https://staff.csie.ncu.edu.tw/hsufh/COURSES/SPRING2021/ppl.html)
- [線上發問系統](https://tlk.io/ncu_ppl2021)
### Material
- Concepts of programming language (Robert W. Sebesta)
:::
---
# Chapter 6 -- Data Types
- **Data type** defines:
- a collection of **data value**
- a set of **operation**
## Scalar / Structured Variable
- Scalar type
- int, float, char...
- Structured type
- array, list, hash, struct...
## Variables and Descriptors
- A descriptor is the collection of the attributes of a variable.
- e.g. Name, Address, Value, Type, Lifetime, Scope
- In an implementation, a descriptor is an area of memory that stores the attributes of a variable.
- In all cases, descriptors are used for type checking, and to build the code for the allocation of deallocation operations.
### Static Attributes
- Required only at compile time
- Built by the compiler, usually a part of the symbol table and are used during compilation
### Dynamic Attributes
- Must be maintained during execution
- Used by runtime system
## Numeric Type
- Many early programming languages had only numeric primitive types.
- Numeric types still play a central role among the collections of types supported by contemporary languages.
- e.g.
- Integer
- Floating Point
- Decimal
- Boolean Types
- Character Types
- Complex
## Integer Representation
- 三種常見的表示方式
- One's complement
- Two's complement
- Sign magnitude
## Problem of Floating Point Numbers
- The loss of accuracy through arithmetic operations
- 理應同義的算式,在電腦運算時,會因為 accuracy loss 而導致結果不同
- (a / b) * c
- (a * c) / b
## Decimal
- Decimal types are stored very much like character strings, using binary codes for the decimal digits.
- The operations on decimal values are done in hardware on machines that have such capabilities; otherwise, they are simulated in software.
## Boolean Type
- A Boolean value could be represented by a single bit.
- But because a single bit of memory is difficult to access efficiently on many machines, they are often stored in the smallest efficiently addressable cell of memory, typically a byte.
## Character Type
### ASCII
- The most commonly used coding is ASCII (American Standard Code for Information Interchange), which uses the values 0 to 127 to code 128 different characters.
- Many programming languages include a primitive type for character data.
### Unicode
- A 16-bit character set named Unicode has been developed as an alternative.
- Unicode includes the characters from most of the world's natural languages.
- For example, Unicode includes the Cyrillic alphabet, as used in Serbia, and the Thai digits.
## Character String type
- 基本上就是 sequence of characters
### Design issues
- `a special kind of character array` or `a primitive type`
- 用 array 模擬
- e.g. `C`, `C++`
- 直接設成原生的一種 type
- e.g. `Python`, `Java`
- string have `static` or `dynamic` length
### Examples
- `C`
- use character array to present string
- 字串結尾為 `null`, (compiler 會自己將字串常數的尾巴上加 `null`)
- 提供 `string.h` lib 做 string operation
```graphviz
digraph {
label="array"
n [label="<p>h|e|l|l|o| |w|o|r|l|d|!|\\0" shape="record"]
ptr->n:p
}
```
- `Java`
- **String pool**
- Each time you create a string literal, the JVM checks the string constant pool first.
- If the string already exists in the pool, a reference to the pooled instance is returned.
- If string doesn't exist in the pool, a new string instance is created and placed in the pool.
```graphviz
digraph{
rankdir=LR;
node [shape="record"]
n [label="...|<s2> s2|<s1> s1"]
subgraph cluster_1 {
label="Heap"
subgraph cluster_2 {
label="string constant pool"
w [label=" \"Welcome\" "]
}
}
n:s1 -> w
n:s2 -> w
}
```
```java=
String s1 = "Welcome";
String s2 = "Welcome"; // will not create new instance
```
```graphviz
digraph {
node[shape="record"]
subgraph cluster_method_area {
label = "Method Area (string constant pool)"
data [label="\"abcd\"" ]
}
subgraph cluster_heap {
label = "heap"
data1 [label="\"abcd\""]
data2 [label="\"abcd\""]
}
a -> data
b -> data
c -> data1
d -> data2
}
```
```java=
String a = "abcd";
String b = "abcd"; // use object in string pool
String c = new String("abcd"); // create a new object
String d = new String("abcd");
```
- 有趣的結果 (Strings are immutable!)
```java=
String s = "test";
String s2 = s; // referenced to string pool
s = s.concat(" hi");
System.out.println(s); // "test hi"
System.out.println(s2); // "test"
```
### Pattern Matching
> Pattern matching is another fundamental character string operation !
### Dynamic Length vs Static Length
- s
- d
## Array Types
- 一組同形態的資料
- 表示法
- 同常都是 brackets: `[]` 或 parenthesis: `()`
### Ordinal Type
- A **ordinal type** is one in which the range of possible values can be easily associated with the set of positive integers.
- e.g.
- Java: `int`, `char`, `boolean`
- User-defined: `enum`, `subrange`
### Subscript Range Check
- Early programming languages did NOT specify that subscript ranges must be implicitly checked.
- Range errors in subscripts are common in programs, so requiring ***range checking*** is an important factor in the **reliability** of languages.
> 大部分的程式語言並不會檢查索引值超界的錯誤
- C, C++, Fortran 不會檢查
- Java, C#, ML 會有錯誤
### Array Categories
#### Static arrays
- The subscript ranges are statically bound
- Storage allocation is static (done before run time).
- Advantage
- **Efficiency**: No dynamic allocation or deallocation is required.
> 缺點是記憶體空間無法另作他用 [name=教授]
- Example
- Arrays declared in **C and C++** functions that include the `static` modifier are examples of static arrays.
#### Fixed stack-dynamic arrays
- The subscript ranges are statically bound
- The allocation is done at **declaration elaboration** time ++during execution++.
- Advantage
- **Space Efficiency**: A large array in one procedure can use the same space as a large array in a different procedure, as long as both procedures are not active at the same time.
- Example
- Arrays that are declared in **C and C++** functions (==without== the `static` specifier) are examples of fixed stack-dynamic arrays.
#### Stack-dynamic arrays
- The subscript ranges are ***dynamically*** bound at elaboration time
- The storage allocation is ***dynamically*** bound at elaboration time.
- Advantage
- **Flexibility**: The size of an array need not be known until the array is about to be used.
- Example
- **Ada** arrays can be stack-dynamic.
#### Fixed heap-dynamic arrays
- Same with fixed stack-dynamic array, but the allocation is done in **heap**.
- Example
- **Java** arrays
- **C and C++** also provide fixed heap-dynamic arrays.
#### Heap-dynamic arrays
- Often provided by **interpreted languages**
- Arrays can grow and shrink during program execution as the need for space changes.
- Example
- **Perl** and **Javascript**
### Heterogeneous Arrays
- A **heterogeneous array** is one in which the elements need not be of the same type.
- Example
- **Perl**, **Python**, **Javascript**, and **Ruby**
- In all of these languages, arrays are heap dynamic. (Interpreted languages)
### Jagged Arrays
- A **rectangular array** is a multidimensioned array in which ++all of the rows have the same number of elements, all of the columns have the same number of elements++, and so forth.
- Rectangular arrays accurately model tables.
- In contrast, a **jagged array** is one in which the lengths of the rows need not be the same.

### Associative Arrays
- An **associative array** is an unordered collection of data elements that are indexed by an equal number of values called keys. (map-like structure)
- In **Perl**, associative arrays are often called ***hashes***, because in the implementation their elements are stored and retrieved with **hash functions**.
```perl=
%salaries = ("Gary" => 75000, "Perry" => 57000,
"Mary" => 55750, "Cedric" => 47850);
```
## Pointer
### dangling pointer