Python Data Science Toolbox (Part 1)

# Python Data Science Toolbox (Part 1) ###### tags: `Datacamp` `python` `function` `Python Programming` >**作者:何彥南** >Datacamp 課程: [Python Data Science Toolbox (Part 1)](https://www.datacamp.com/courses/python-data-science-toolbox-part-1) **注意:** 1. df 為 pandas 的 DataFrame 的縮寫。 2. pd 為 panda 套件的縮寫。 3. 請以官方文件 [panda doc](https://pandas.pydata.org/pandas-docs/stable/) 為主。 4. 注意panda 的版本，有些功能可能在新版無法使用。 5. 程式碼內`#`標記的地方為 output [TOC] --- # Writing your own functions ## [1-1]User-defined functions ### 1.Built-in functions > str(): 轉成字串格式 ```python= x=str(5) print(x) #[out]:'5' print(type(x)) #[out]:<class 'str'> ``` ### 2.Define a function > 定義一個funtion ```python= def square(): new_value = 4**2 print(new_value) ``` * def square(): 稱作 Function header * 包含在底下的稱作 Function body ### 3.Function parameters > 設定 function 參數 ```python= def square(value): new_value = value**2 print(new_value) ``` * def square(value): 這邊 value 為自訂引數(外部輸入)，function 透過這些輸入的引數去執行，而在 function 底下就是當作參數(變數)使用。 * 引數 (Argument) 是用於呼叫函式， * 參數 (Parameter) 是方法簽章 (方法的宣告)，用於建構 function 與整個程式。 > 使用自己創建的 function ```python= #執行 square(4) #[Out]:16 square(5) #[Out]:25 ``` * square(4): 這邊 *4* 為引數，之後再使用 function 時可依需求去自訂。 ### 4.Return values from functions > funtion 的回應值(return) ```python= def square(value): new_value = value**2 return new_value ``` * return: 用來設定 function 最後輸出的結果 ```python= print(square(4)) #[Out]:16 ``` ### 5.Docstrings > 適度的註解 ```python= def square(value): """Returns the square of a number.""" new_value = value**2 return new_value ``` ## [1-2]Multiple parameters and return values ### 1.Multiple function parameters > 設置多個參數 ```python= def raise_to_power(value1, value2): """Raise value1 to the power of value2.""" new_value = value1 ** value2 return new_value #執行 result = raise_to_power(2, 3) print(result) #[Out]:8 ``` * 我們藉由 raisetopower(2, 3) 輸入兩個引數，在執行 function 時2、3分別為 value1 、value2 的兩個參數，並在 function 底下運行。 ### 2.A quick jump into tuples > 藉由 tuples 讓 function 回應多個值。 * tuple 是不可變的 list * 用括號()建構 ```python= even_nums = (2, 4, 6) print(type(even_nums)) #[Out]:<class 'tuple'> ``` ### 3.Unpacking tuples > 跟 list 一樣，tuple 也可將元素分出來 ```python= even_nums = (2, 4, 6) a, b, c = even_nums #執行 print(a) #[Out]:2 print(b) #[Out]:4 print(c) #[Out]:6 ``` * 設立一個 tuple: even_nums = (2, 4, 6) * 我們可以使用 a, b, c = even_nums，將(2, 4, 6)裡的值分別抓出來。 ### 4.Accessing tuple elements > 跟 list 一樣可指定位置 ```python= even_nums = (2, 4, 6) print(even_nums[1]) #[Out]:4 ``` ### 5.Returning multiple values > 利用 tuple 可從 function 返回多個值 ```python= def raise_both(value1, value2): """Raise value1 to the power of value2 and vice versa.""" new_value1 = value1 ** value2 new_value2 = value2 ** value1 new_tuple = (new_value1, new_value2) return new_tuple #執行 result = raise_both(2, 3) print(result) #[Out]:(8, 9) ``` --- # Default arguments, variable-length arguments and scope * Scope(作用域) : 作用域就是一個變數能作用的範圍, 在有效的作用範圍裡可以自由的呼叫該變數。 * 「全域性名稱空間」（Global Scope）: * defined in the main body of a script * 定義在整個程式裡，在整個程式裡面都可以呼叫它。 * 「區域性名稱空間」（Local scope）: * defined inside a function * 定義在 function 裡，只有在 function 裡才可以呼叫它。 * 「內建名稱空間」（Built-in scope): * names in the pre-defined built-ins module * 定義在其他預先寫好的 module(模組) 裡，可在使用 module 時呼叫它。 * 在Python中變數無需宣告就可以直接使用並指定值，除非特別使用 global 或 nonlocal 指明，否則 Scope(作用域) 總是在指定值時建立。 ## [2_1]Scope and user-defined functions ### 1.Global vs. local scope > 特性一:function裡定義的區域變數只存在在function裡面。 ```python= def square(value): """Returns the square of a number.""" new_value = value**2 return new_value #執行 square(3) #[Out]:9 new_value #[Out]:Error ``` * new_value 在全域裡面未被定義，所以出現 Error 。 > 特性二:全域變數與 function 裡的區域變數，在各自有定義的情況下互相獨立。 ```python= new_value=10 def square(value): """Returns the square of a number.""" new_value = value**2 return new_value #執行 square(3) #[Out]:9 new_value #[Out]:10 ``` * new_value 在全域和區域都被定義，雖然名字一樣但他們是各自獨立的。 > 特性三:可直接在 function 裡呼叫並使用以被定義的全域變數。 ```python= new_val = 10 def square(value): """Returns the square of a number.""" new_value2 = new_val ** 2 return new_value2 #執行 square(3) #[Out]:100 new_val = 20 square(3) #[Out]:400 ``` * function 在未定義自己的區域變數 new_value 下可直接使用全域定義的 new_value。 > 特性四:在 function 裡可直接用 global 宣告全域變數，並可取代原本的全域變數。 ```python= new_val = 10 def square(value): """Returns the square of a number.""" global new_val new_val = new_val ** 2 return new_val #執行 square(3) #[Out]:100 new_val #[Out]:100 ``` * 原本的全域變數 new_val，可藉由在 function 裡宣告 global new_val 改變全域變數。 ## [2_2]Nested functions ### 1.Nested functions > 嵌套函数(Nested functions)結構，就是 function 裡還有一個 function。 ```python= def outer( … ): """ … """ x = … def inner( … ): """ … """ y = x ** 2 return … ``` > 使用嵌套函数可使用較精簡的　code 達到一樣效果。 > before ```python= def mod2plus5(x1, x2, x3): """Returns the remainder plus 5 of three values.""" x1 = x1 % 2 + 5 x2 = x2 % 2 + 5 x3 = x3 % 2 + 5 return (x1, x2, x3) ``` > after ```python= def mod2plus5(x1, x2, x3): """Returns the remainder plus 5 of three values.""" def inner(x): """Returns the remainder plus 5 of a value.""" return x % 2 + 5 return (inner(x1), inner(x2), inner(x3)) ``` ### 2.Returning functions > 回歸函數(Returning functions) ```python= def raise_val(n): """Return the inner function.""" def inner(x): """Raise x to the power of n.""" raised = x ** n return raised return inner ``` * raise_val(n) 回傳了 inner()這個函數。 * 下面執行的部分可看出 square(2)=raise_val(2)(2)，也就是在n=2的情況下回傳inner(2)。 ```python= #執行 square = raise_val(2) print(square) #[Out]: <function __main__.raise_val.<locals>.inner(x)> type(square) #[Out]:function cube = raise_val(3) print(square(2), cube(4)) #[Out]: 4 64 ``` ### 3.Using nonlocal > nonlocal:可以使用非同個 function(區域) 內的變數 ```python= def outer(): """Prints the value of n.""" n = 1 def inner(): nonlocal n n = 2 print(n) inner() print(n) #執行 outer() #[Out]:1 #[Out]:2 ``` * 這邊可以看的出來，在 function 裡使用 nonlocal 可以使用並改變全域變數。 ## [2_3]Default and flexible ### 1.Add a default argument > 加入預設引數 ```python= Add a default argument def power(number, pow=1): """Raise number to the power of pow.""" new_value = number ** pow return new_value #執行 power(9, 2) #[Out]:81 power(9) #[Out]:9 ``` * 因為我們有對 pow 設預設引數，所以就算在使用此 function 時沒有輸入第二個引數時，還是會有預設引數去執行。 ### 2.Flexible arguments: *args > 利用 *arg 做引數，輸入變數的數量可變化。 ```python= def add_all(*args): """Sum all values in *args together.""" # Initialize sum sum_all = 0 # Accumulate the sum for num in args: sum_all += num return sum_all #執行 add_all(1, 2) #[Out]: 3 add_all(5, 10, 15, 20) #[Out]: 50 ``` * *args 引數:當你把它設為引數時，代表你這個 function 可以接受輸入一串變數輸入，而且數量可以自由變化。 * 所以在 function裡你可以把 args 當作一個 tuple 去操作。 ### 3.Flexible arguments: **kwargs > 利用 **kwargs 引數，實現 dict(字典) 的使用。 ```python= def print_all(**kwargs): """Print out key-value pairs in **kwargs.""" print(kwargs) #[Out1] print(type(kwargs)) #[Out2] # Print out the key-value pairs for key, value in kwargs.items(): print(key + ": " + value) #[Out3] #執行 print_all(name="dumbledore", job="headmaster") #[Out1]: {'name': 'dumbledore', 'job': 'headmaster'} #[Out2]: <class 'dict'> #[Out3]: job: headmaster # name: dumbledore ``` * **kwargs 引數: 接受輸入 (name="dumbledore")的形式 * 從上面的 [Out1][Out2] 可以看得出來輸入的 name="dumbledore" 會自動轉成 'name': 'dumbledore' ，也就是 dict(字典) 的模式。 * 在 function 底下，把 kwargs 視為 dict(字典)來操作就好了。 --- # Lambda functions ## [3-1]Lambda functions > Lambda(速算式)，也稱做Anonymous function(匿名函式) > 結構 : lambda arg1, arg2, ....: expression ```python= raise_to_power = lambda x, y: x ** y raise_to_power(2, 3) #[Out]:8 ``` > 結合 map() 函數， map(func, seq) ```python= nums = [48, 6, 9, 21, 1] square_all = map(lambda num: num ** 2, nums) print(square_all) #[Out]:<map object at 0x103e065c0> print(list(square_all)) #[Out]:[2304, 36, 81, 441, 1] ``` * map() :可以將 list 中的每個元素，分別執行前面的 func (lambda functions)，並輸出一樣長的 list。 ## [3-2]Introduction to error handling ### 1.Passing an incorrect argument > 使用正確的引數 > 這邊我們已以 float() 為例 ![](https://i.imgur.com/7wMn6QJ.png) >嘗試輸入不同類型的值 ```python= float(2) #[Out]: 2.0 float('2.3') #[Out]: 2.3 float('hello') #[Out]: ''' ------------------------------------------------------------------ ValueError Traceback (most recent call last) <ipython-input-3-d0ce8bccc8b2> in <module>() ----> 1 float('hi') ValueError: could not convert string to float: 'hello' ''' ``` * 大家可以發現，輸入string(字串時)，python會回應我們 ValueError。 * 還告訴我們問題在哪一行 ( ----> 1 float('hi') ) ，還有問題出在哪? ( ValueError: could not convert string to float: 'hello' ) * 而這邊的問題是 float() 不支援 string 轉 float 。 > 這邊我們自己創一個 function 試試 ```python= def sqrt(x): """Returns the square root of a number.""" return x ** (0.5) #執行 sqrt(4) #[Out]:2.0 sqrt(10) #[Out]:3.1622776601683795 sqrt('hello') #[Out]: ''' ------------------------------------------------------------------ TypeError Traceback (most recent call last) <ipython-input-4-cfb99c64761f> in <module>() ----> 1 sqrt('hello') <ipython-input-1-939b1a60b413> in sqrt(x) 1 def sqrt(x): ----> 2 return x**(0.5) TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'float' ''' ``` * ----> 1 sqrt('hello') : 這邊可以看到問題是出在執行 sqrt('hello') 的時候。 * ----> 2 return x**(0.5) : 這邊可以知道在執行 sqrt() 時哪裡出問題。 * 原因是因為 str 無法運算 ### 2.Errors and exceptions > try-except　模式 > 由 try 和 except 組成，嘗試執行 try 底下的程式，如發生 exceptions(例外) ，則執行 except 底下的程式。 ```python= def sqrt(x): """Returns the square root of a number.""" try: return x ** 0.5 except: print('x must be an int or float') #執行 sqrt(4) #[Out]: 2.0 sqrt(10.0) #[Out]: 3.1622776601683795 sqrt('hi') #[Out]: x must be an int or float ``` * 執行的時候可以看到，我們可以利用 try-except　模式，讓自己的 function 無法順利執行時，提醒使用者的話。 * 像它這邊很清楚的跟你講 : x must be an int or float > 下面我們也可以直接指定遇到什麼錯誤時，要提醒使用者的話 ```python= def sqrt(x): """Returns the square root of a number.""" try: return x ** 0.5 except TypeError: #指定TypeError print('x must be an int or float') ``` > 我們也可以利用 raise ___Error('提醒') ，產生Error並提醒 ```python= def sqrt(x): """Returns the square root of a number.""" if x < 0: #符合條件的話產生 ValueError raise ValueError('x must be non-negative') try: return x ** 0.5 except TypeError: print('x must be an int or float') #執行 sqrt(-2) ''' ----------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-2-4cf32322fa95> in <module>() ----> 1 sqrt(-2) <ipython-input-1-a7b8126942e3> in sqrt(x) 1 def sqrt(x): 2 if x < 0: ----> 3 raise ValueError('x must be non-negative') 4 try: 5 return x**(0.5) ValueError: x must be non-negative ''' ``` --- # test ## 第一題 >請在___處輸入程式，並得到指定output。 ```python= x = ['ROCK', 'PAPER'] def ___(___): """"Returns a list, with all elements capitalized""" def ___(___): """Returns a capitalized word""" return w.capitalize() return ([inner(li[0]), inner(li[1])]) print(caps(x)) #[Out]:['Rock', 'Paper'] ``` * 解答 ```python= x = ['ROCK', 'PAPER'] def caps(li): """"Returns a list, with all elements capitalized""" def inner(w): """Returns a capitalized word""" return w.capitalize() return ([inner(li[0]), inner(li[1])]) #執行 print(caps(x)) #[Out]:['Rock', 'Paper'] ``` --- ## 第二題 > What is the scope of the variable y? ```python= def cube(x): c = x ** 3 return c y = 4 print(cube(y)) ``` * 解答:Global --- ## 第三題 >請在 ___ 處輸入程式碼，並得到指定output。 ```python= temp = 40 def convert_temp(___): """Converts the temperature from Celsius to Fahrenheit""" ___ temp ___= (x * 1.8) + 32 #執行 convert_temp(temp) print(temp) #[Out]:104.0 ``` * 解答 ```python= temp = 40 def convert_temp(x): """Converts the temperature from Celsius to Fahrenheit""" global temp temp= (x * 1.8) + 32 #執行 convert_temp(temp) print(temp) #[Out]:104.0 ``` --- ## 第四題 >請問___要填入甚麼才能得到指定output。 > 1. args,args > 2. **args > 3. args,args,args > 4. *args > 5. x ```python= def mean(___): """Returns the mean of all the numbers""" total_sum = 0 # Intial sum n = len(args) # Number of arguments for x in args: total_sum = total_sum + x return total_sum/n #執行 print((mean(3, 4), mean(10, 15, 20))) #[Out]: (3.5, 15.0) ``` * 解答: 4 * P.S :需要一次多個引數時不可能一直增加預設參數，這時候我們可以用「*」來將引數收集到一個 tuple 中，在使用for 迭代處理。 * P.S :如果我們要指定的參數太多而造成版面不簡潔的話，可以考慮使用「**」來拆解一個裝有參數名與值的 dict。 * 引用網站:[按這裡](https://skylinelimit.blogspot.com/2018/04/python-args-kwargs.html) --- ## 第五題 >Which option replaces the ___in the following function definition? ```python= def sqrt(x): """Returns the square root of a number""" try: return x ** (1/2) except ___: print('x must be int or float') #執行 sqrt(4) #[Out]:2.0 sqrt(str(4)) #輸入值為字串，發生TypeError。 #[Out]:x must be int or float ``` * 解答: TypeError * P.S: 在function裡面設置except ___:可以幫助使用者在發生一些錯誤的時候提醒他，以下示範提示功能。 --- ## 第六題 >在___輸入甚麼可以的到指定的output? 1. *x 2. **x 3. x ```python= def easy_print(___): for key, value in x.items(): print('The value of ' + str(key) + " is " + str(value)) (easy_print(a = 10), easy_print(b = 20)) #[Out]: The value of a is 10 # he value of b is 20 ``` * 解答: 2 * 複習: [Flexible-arguments-kwargs](https://hackmd.io/KVJq5V7SRYWRWM38lNhS8A?view#3Flexible-arguments-kwargs) --- ## 第六題 >Within the function definition, the parameter **args*** is turned into a ___. 1. dictionary 2. tuple 3. list 4. integer * 解答: 2 * 複習: [Flexible-arguments-args](https://hackmd.io/KVJq5V7SRYWRWM38lNhS8A?view#2Flexible-arguments-args) --- ## 第七題 > 完成以下code ```python= x = [2, -6, 10, -7, 1] greater_than_zero = ___( ___ n: (n > 0), x) print(list(greater_than_zero)) #[Out]:[2, 10, 1] ``` * 解答: filter , lambda * 補充: >filter, map, reduce 都是針對集合物件處理的特殊函式，可有助於>python的資料處理以及程式簡化。 > >> **1.filter(function, sequence)** >>以傳入的boolean function作為條件函式，iterate所有的sequence的元素並收集 function(元素) 為True的元素到一個List。 > >> **2.map(function, sequence)** >> iterate所有的sequence的元素並將傳入的function作用於元素，最後以List作為回傳值。 > >> **3.reduce(function, sequence)** >>必須傳入一個binary function(具有兩個參數的函式)，最後僅會回傳單一值。 >> >>reduce會依序先取出兩個元素，套入function作用後的回傳值再與List中的下一個元素一同作為參數，以此類推，直到List所有元素都被取完。 --- ## 第八題 > 完成以下code ```python= def add_zeros(string): """Returns a string padded with zeros to ensure consistent length""" updated_string = string + '0' def add_more(): """Adds more zeros if necessary""" nonlocal updated_string updated_string = updated_string + '0' while len(updated_string) < 6: add_more return ___ #執行 (add_zeros('3.4'), add_zeros('2.345')) #[Out]: ('3.4000', '2.3450') ``` * 解答: updated_string * 重點: 第8行的 **nonlocal** updated_string，可不受區域變數限制。