02 Data Structure **1** - Sequential Structures === ###### tags: `Algorithm` 這篇筆記是我參考【演算法圖鑑】和網路資源所製作。如果內容有誤,還請不吝賜教! ## Table of Content [toc] ## **Introduction** In general, data structures are different ways we combine and arrange the data. However, there are various data structures (e.g. stack, queue, heap, tree, ...), different arrangements of data give all sorts of structures. In this chapter, we will see some kinds of **sequential structures**, which means data is put in linear order. > array, list, stack, queue, hash table ## **Array** In the majority of programming languages, **declaration** is needed for variables as well as array, which means computer has to assign specific memory spaces before use of an array. However, in **python** declarations of any variables involving array are never a prerequisite of using them since python is a **dynamically typed** language (i.e. types of all objects are not defined until runtime). After declaration(ask computer for memory space), data is put in a clear order. And if we want to refer or retrieve data, we use **index** to find its location. However if we want to insert or remove data from the array, it cost much more time to be done due to **searching for our goal** and **shifting the remainders**. ### Time Complexity > Add/Delete data from an array : $O(n)$ > Get data by index : $O(1)$ ### Implementation Below is the implementation of array. (In python, an array is called a "list") #### **Make a List** > list = [var1, var2, ... , varN] > var = any ( even another array ) ```python= x = [10, 20, 'string'] # [10, 20, 'string'] y = [1, 2, 3] # [1, 2, 3] z = ['p', 'y', 't', 'h', 'o', 'n'] # ['p', 'y', 't', 'h', 'o', 'n'] print(x, y, z) ``` :::info Unlike tuple, values in a list are still mutable. Hence, it's illegal to use a list as a key in hash tables. ::: #### **Using multiple variables to get values from a list** ```python= x1, x2, x3 = x # 10 20 'string' y1, y2, y3 = y # 1 2 3 ``` Using \*var expression to assign multiple values to a variable(This variable becomes a list). ```python= *xa, xb = x # xa = [10, 20] | xb = 'string' z1, z2, *z3, z4 = z # z1 z2 -> front, z4 -> rear, z3 -> remainders print(z1, z2, z3, z4) # p y ['t', 'h', 'o'] n ``` #### **Get a value from a list via index** ```python= print(z[1]) # 'y' ``` :::info Index of the first item is 0! ::: #### **Other methods in class 'list'** Count / Index ```python= list.count(any) # return how many times a value appears in a list list.index(any) # find the index of a value in a list # return the smallest index if the value appears many times # return -1 if it doesn't exist in the list a = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4] print(a.count(3)) # 3 print(a.index(4)) # 6 ``` Check if a value is in the list ```python= # if any in list: # do ... if 3 in a: print('3 is in a.') ``` Revise an item in a list ```python= x = [1, 2, 3, 4, 5, 6, 7, 8] x[0] = 1111111 x[1] = 2222222 print(x) # [1111111, 2222222, 3, 4, 5, 6, 7, 8] ``` """ Add data ```python= list.append(any) # Add a value at the end list.extend([array]) list += [array] # Add a bunch of values at the end list.insert(index, any) # Insert a value at given position ``` Delete data ```python= list.remove(any) # Remove a value in a list # If it appears many times, remove the frontmost one list.pop(index) # Remove the value in given position # If the argument is not assigned, remove the last value list.clear() # Clear all the values in the list del list # delete the list itself ``` Get length of a list ```python= len(list) ``` #### **Concepts clarification** *1. Wrong way to copy a list* ```python= x = [1, 2, 3] y = x ``` In 'y = x', python consider this statement as 'making a pointer y pointing to the list[1, 2, 3] just like x.' Which means: :::danger When changing values in x, **items in y will also be revised**! ::: Correct ways to copy a list: (1) list comprehention ```python= list_copy = [i for i in list] ``` (2) 'copy()' method ```python= list_copy = list.copy() ``` (3) 'slice' in python ```python= list_copy = list[:] ``` Examples: ```python= a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] b = [i for i in a] b = a.copy() b = a[:] ``` *2. list 'multiplying'* a. list *= n, n > 0 ```python= c = ['p', 'y'] c *= 3 # c = ['p', 'y', 'p', 'y', 'p', 'y'] ``` b. list *= n, n <= 0 = list.clear() ```python= c = ['O', 'w', 'O'] c *= -1 # c = [] ``` ## **Tuple** Most features of a tuple are similar to an array(They both support item retrieving via indexes). However, the most significant difference between array and tuple is that **tuple is immutable**. Thus tuples can be used as keys of hash tables. Futhermore tuples are more efficient than lists in python. ### **Time Complexity** Get data by index : $O(1)$ ### **Implementation** #### Make a tuple ```python= tuple = var1, var2, ... , varN varI = any (even another array) x = 10, 20, 'string' # (10, 20, 'string') y = 1, 2, 3 # (1, 2, 3) z = 'p', 'y', 't', 'h', 'o', 'n' # ('p', 'y', 't', 'h', 'o', 'n') print(x, y) ``` #### **Using multiple variables to get values from a tuple** ```python= x1, x2, x3 = x # 10 20 'string' y1, y2, y3 = y # 1 2 3 ``` #### **Get a value from a tuple via index** ```python= print(z[1]) # 'y' ``` :::info Index of the first item is 0! ::: #### **Other methods in class 'tuple'** Count / Index ```python= tuple.count(any) # return how many times a value appears in a tuple list.index(any) # find the index of a value in a tuple # return the smallest index if the value appears many times # return -1 if it doesn't exist in the tuple a = 1, 2, 2, 3, 3, 3, 4, 4, 4, 4 print(a.count(3)) # 3 print(a.index(4)) # 6 ``` Check if a value exists in a tuple ```python= # if any in tuple: # do ... if 3 in a: print('3 is in a.') ``` Get length of a tuple ```python= len(tuple) ``` Copy a tuple 1. tuple comprehension ```python= x = tuple(i for i in iterator/generator) ``` 2. slice in python ```python= y = x[:] ``` ## **(Linked) List** **Head -> Node -> ... -> Last Node -> Null** (**->** stands for **pointer**) There's no indexing system in list. Instead, lists use **pointer** to store the position of the next data. Hence, it takes time to find the specific data since we have to search linearly. On the other hand, adding as well as deleting can be done with ease because we only have to revise the direction of pointers. *Adding node will look like this* **Head -> A => C -> D -> Null** **Head -> A => B -> C -> D -> Null** (**=>** points toward ~~**C**~~ -> **B**) *Deleting node will look like this* **Head -> A => O -> B -> C -> D -> Null** **Head -> A => B -> C -> D -> Null** (**=>** points toward ~~**O**~~ -> **B**) ### **Time Complexity** Add/Delete a node : $O(1)$ Find a specific data : $O(n)$ ### **Implementation** ```python= class ListNode: def __init__(self, value): self.val = value self.next = None class LinkedList: def __init__(self, head: ListNode): self.head = head self.tail = self.head self.cur = self.head self.last = None # add/insert a node in the linked list def append(self, node: ListNode, behind: ListNode =None): self.cur = self.head if not behind: # if 'behind' is not given, # then add the node at rear end self.tail.next = node self.tail = self.tail.next elif behind.val == self.tail.val: # if 'behind' is simply the tail, # then do the same thing as the above :D self.tail.next = node self.tail = self.tail.next else: while self.cur.next: if self.cur.val == behind.val: # found it! temp = self.cur.next self.cur.next = node self.cur = self.cur.next self.cur.next = temp break self.cur = self.cur.next # if 'behind' doesn't exist in the list, # then just don't do anything def pop(self, node: ListNode =None): # if there's only 1 node in the whole linked list, # then don't pop anything if not self.head.next: return None self.cur = self.head self.last = None while self.cur: try: if self.cur.val == node.val: self.last.next = self.cur.next break except AttributeError: # This means 'node' is not given # then we'll delete the last node # find the new tail self.tail = self.head while self.tail.next.next: self.tail = self.tail.next self.tail.next = None break self.last = self.cur self.cur = self.cur.next def traverse(self) -> list: # returns a list of values of nodes in order self.cur = self.head values = [] while self.cur: values.append(self.cur.val) self.cur = self.cur.next return values def generate(self): # returns a generator self.cur = self.head while self.cur: yield self.cur.val self.cur = self.cur.next linked = LinkedList(ListNode(0)) for i in range(1, 10): linked.append(ListNode(i)) print(linked.traverse()) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] linked.append(ListNode(5.5), behind=ListNode(5)) for i in linked.generate(): print(i, end=' ') # 0 1 2 3 4 5 5.5 6 7 8 9 linked.pop() print(linked.traverse()) # [0, 1, 2, 3, 4, 5, 5.5, 6, 7, 8] linked.pop(ListNode(5.5)) for i in linked.generate(): print(i, end=' ') # 0 1 2 3 4 5 6 7 8 ``` ### **More Extensions** #### ***1. Circular Linked List*** : :::info Last node points toward **the head node**. ::: **Head -> Node -> ... -> Last Node -> Head -> ...** (**->** stands for **pointer**) #### ***2. Doubly Linked List*** : :::info Every node has 2 pointers : one points toward **the next one**, the other points toward **the previous node**. ::: **Head <-> Node <-> ... <-> Last Node <-> Null** (**->** stands for **pointer**) ## **Stack** Stack is also a sequential structure. However, similar to piling up plates, the last data we put in will be the first to be taken out, and this feature is called **"LIFO"** (last in first out). ### Implementation ```python= class Stack: def __init__(self, init_stk: list): self._stk = init_stk # add data - push def push(self, dt): self._stk.append(dt) # delete data - pop def pop(self): self._stk.pop() # get top - top def top(self): return self._stk[-1] # get length - size def size(self): return len(self._stk) stack = Stack([1, 2, 3]) stack.push(4) # 1 2 3 4 stack.push(5) # 1 2 3 4 5 print(stack.size()) # 5 print(stack.top()) # 5 stack.pop() # 1 2 3 4 print(stack.top()) # 4 ``` ## **Queue** Queue is completely different from stack. Just like waiting in line, the first data we push in will also be the first to be poped out, and this feature is called **"FIFO"** (first in first out). ### Implementation ```python= class Queue: def __init__(self, init_qu: list): self._qu = init_qu # 加入資料 - push def push(self, dt): self._qu.append(dt) # 取出資料 - pop def pop(self): self._qu.pop(0) # 取得前面的資料 def get_front(self): return self._qu[0] # 取得長度 def size(self): return len(self._qu) queue = Queue([1, 2, 3]) queue.push(4) # 1 2 3 4 queue.push(5) # 1 2 3 4 5 print(queue.size()) # 5 print(queue.get_front()) # 1 queue.pop() # 2 3 4 5 print(queue.get_front()) # 2 ``` ### **More Extensions** #### ***1. Deque (Double Ends Queue)*** : A normal queue only offers one side for pushing in data and the other for popping out. However, the design of dequeue allows more convenient operations - we can push in and pop out data **at both ends**. > ***Implementation*** ```python= class Deque: def __init__(self, init_deq: list): self._deq = init_deq # 插入資料 - push (front and back) def push(self, deq, command): # command : 'f' -> front ; 'b' -> back self._deq.insert((len(self._deq) + 1) * int(command == 'b'), deq) # 取出資料 - pop (front and back) def pop(self, command): # command : 'f' -> front ; 'b' -> back self._deq.pop(-1 * int(command == 'b')) # 取得頭尾兩端的資料 def get(self, command): # command : 'f' -> front ; 'b' -> back return self._deq[-1 * int(command == 'b')] # 取得長度 def size(self): return len(self._deq) deque = Deque([1, 2, 3]) deque.push(0, 'f') # 0 1 2 3 deque.push(4, 'b') # 0 1 2 3 4 print(deque.size()) # 5 print(deque.get('f')) # 0 print(deque.get('b')) # 4 deque.pop('f') # 1 2 3 4 deque.pop('b') # 1 2 3 print(deque.get('f')) # 1 print(deque.get('b')) # 3 ``` #### ***2. Priority Queue*** : In comparison with queue, we pop out the data with **the biggest weight** from a priority queue instead of taking out the data we first put in. > ***Implementation*** ```python= class PriorityQueue: def __init__(self, init_pq: list): init_pq.sort() self._pq = init_pq self._prior = 'big' # 設定優先條件 def set_prior(self, command): # command : big or small self._prior = command # 加入資料 def insert(self, pq): # 先確定是否要插入在邊界,再確認要放在中間的哪個位置 if pq <= self._pq[0]: self._pq.insert(0, pq) elif pq >= self._pq[-1]: self._pq.insert(-1, pq) else: for i in range(len(self._pq) - 1): if self._pq[i] <= pq <= self._pq[i + 1]: self._pq.insert(i + 1, pq) break # 取出資料 def pop(self): self._pq.pop(-1 * int(self._prior == 'big')) # 取出最優先的資料 def get(self): return self._pq[-1 * int(self._prior == 'big')] # 取得資料長度 def size(self): return len(self._pq) # 'big' pr_qu = PriorityQueue([0, 5, 2, 3, 5, 7]) # 0 2 3 5 5 7 pr_qu.insert(4) # 0 2 3 4 5 5 7 pr_qu.insert(6) # 0 2 3 4 5 5 6 7 print(pr_qu.size()) # 8 print(pr_qu.get()) # 7 pr_qu.pop() # 0 2 3 4 5 5 6 print(pr_qu.get()) # 6 # 'small' pr_qu.set_prior('small') # 0 2 3 4 5 5 6 print(pr_qu.get()) # 0 pr_qu.pop() # 2 3 4 5 5 6 pr_qu.pop() # 3 4 5 5 6 print(pr_qu.size()) # 5 print(pr_qu.get()) # 3 ``` ## **Hash Table** ### **Hash Function** Hash function is a function that turns given data into a fixed-length value, called "hash code". Hash function has the following features: 1. **Fixed Length of Hash Code** - No matter how massive the given data is, the length of hash code is always fixed. 2. **Same Data, Same Hash Code** 3. **Similar Data, Different Hash Code** 4. **Unidirectional** - It's impossible to infer the original data from its hash code. 5. **Hash Collision** - Different data might have the same hash code, when this occurs, we call it "hash collision". However, chances of collision are extremely slim. ### Implementation ```python= def hash(x): value = 0 for i in str(x): value += _pow(ord(i) * 987654321, 123456789, 10 ** 20) code = int(str(value)[:15:]) if code < 10 ** 14: return code + 123456789876543 else: return code def _pow(a, b, n): if b == 0: return 1 if b == 1: return a if b % 2 == 0: return _pow(a, int(b / 2), n) ** 2 % n else: return _pow(a, int((b - 1) / 2), n) ** 2 * a % n print(hash('python')) # 189770782082354 print(hash('pythona') == hash('pythonad')) # True - hash collision occured ``` ### **Hash Table** Hash table is like a dictionary. Each item is composed of a key and a value, at which the key points. The most significant advantage of hash table is that searching can be done in a really short time, rather than doing linear search. This makes hash table a superior data structure. Following shows how to build a hash table : 1. We now have a key and its value. 2. Index = hash(key) mod (length of table) *(complexity : $O(1)$)* 3. Put the value at table[index]. If hash collision occurs (i.e. table[index] is not empty), we use other methods to put in the value. The implementation below uses a linked list to connect these data. *(complexity : $O(1)\ \text{~} \ O(n)$)* This is how hash table works while doing searching : 1. The user gives a key. 2. Index = hash(key) mod (length of table) *(complexity : $O(1)$)* 3. Use the index to find the value and return it if the value exists. *(complexity : $O(1)\ \text{~} \ O(n)$)* ### Time Complexity > Insert/Delete an element : $O(1)$(average) to $O(n)$(worst case - too many collisions) > Searching : $O(1)$(average) to $O(n)$(worst case - too many collisions) ### Implementation ***1. Make a class 'Hash Table'*** ```python= class HashTable: def __init__(self, data: dict): self._data = [] self._index = 0 for self._i in data.keys(): self._data.append([self._i, data[self._i]]) self._table = [[] for self._i in range(int(len(self._data) ** 0.5) + 1)] HashTable.make_table(self, self._data) @staticmethod def __pow(a, b, n) -> int: if b == 0: return 1 if b == 1: return a if b % 2 == 0: return HashTable.__pow(a, int(b / 2), n) ** 2 % n else: return HashTable.__pow(a, int((b - 1) / 2), n) ** 2 * a % n # 雜湊函數 - 除留餘數法 + 字元碼運算 @staticmethod def __hash(x) -> int: value = 0 for i in str(x): value += HashTable.__pow(ord(i) * 987654321, 123456789, 10 ** 20) code = int(str(value)[:15:]) if code < 10 ** 14: return code + 123456789876543 else: return code def make_table(self, data: list): for i in data: # 這裡先不模擬碰撞時指標的存取 # 碰撞時,直接存入二維陣列,稱為 "鏈結法" self._table[HashTable.__hash(i[0]) % len(self._table)].append(i) def search(self, goal): for i in self._table[HashTable.__hash(goal) % len(self._table)]: if i[0] == goal: return i[1] else: return '{} is not in the table'.format(goal) def show(self): print('-' * 40) for i in range(len(self._table)): print('NO.{} :'.format(i)) for j in self._table[i]: print('{} = {}'.format(j[0], j[1])) print('-' * 40) hash_table = HashTable({'Joe': 0, 'Sue': 1, 'Dan': 0, 'Nell': 1, 'Ally': 1, 'Bob': 0}) print(hash_table.search('Sue')) # 1 print(hash_table.search('Ally')) # 1 print(hash_table.search('John')) # John is not in the table hash_table.show() hash_table.make_table([['John', 0], ['Mary', 1]]) print(hash_table.search('Mary')) # 1 hash_table.show() ``` ***2. Using Python's Dictionary*** P.S. Python's dictionary is actually a implementation of hash table. **Make a dictionary** ```python= dict1 = {key: value, key: value, ..., key: value} dict2 = dict(key=value, key=value, ..., key=value) ``` **Get values** ```python= from_dict_get_value = dict[key] ``` **More ways to use a dictionary (dict)** 1. using for loop to make a dictionary in some situation ```python= d = {chr(i): i for i in range(65, 91)} # 建立大寫字母的ASCII字元碼對照表 ``` 2. Add/Revise value `dict[key] = value` :::info **if** key already **exists**: &emsp; &nbsp; revise value **else**: &emsp; &nbsp; add value ::: 3. Delete an item (key + value) ```python= del dict[key] dict.pop(key) ``` 4. Deletion and assignment `var = dict.pop(key)` 5. Get all the keys and values ```python= dict.keys() # return keys dict.values() # return values dict.items() # return items (keys + values) ``` :::info Returnings of *keys()*, *values()*, and *items()* are neither tuples nor lists. Instead, they all have **their own types** respectively. ```python= d = {0: 1, 1: 2} print(type(d.keys())) # <class 'dict_keys'> print(type(d.values())) # <class 'dict_values'> print(type(d.items())) # <class 'dict_items'> ``` ::: 6. Check if a key/value exists `if ... in d.keys()/d.values()/d.items()` :::warning item in dict == item in dict.keys() != item in dict.items() ::: 7. Two dictionaries are equal if and only if there **keys** as well as their **values** are all **in correspondence with one another**.