# Pandas ## pandas 參考資料:https://pandas.pydata.org/pandas-docs/stable/reference/index.html ### Series #### 製作出一個一維陣列 ```python= import pandas as pd data = pd.Series([1,2,3,4,5]) print(data) print("Max:",data.max())#output is 5 print("Median",data.median())#output is 3.0 print(data==20)#output is a boolean ``` #### 觀察Series資料 ```python= import pandas as pd data = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) print(data.dtype)#查看資料型態 print(data.size)#查看資料數量 print(data.index)#查看資料索引 ``` #### 資料的索引(索引數量要和元素數量一致) ```python= import pandas as pd data = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) print(data) ``` ```python= import pandas as pd data = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) print(data[2],data[0])##按照順序取得資料 print(data["e"],["d"])##按照索引取得資料 ``` #### Series數學、統計相關函式:<span style = "color:red">`sum()`、`max()`、prod()全部相乘、mean()、median()、std()標準差、nlargest(n)前n大的數字、nsmallest(n)取最小的n個數字</span> ```python= import pandas as pd data = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) print(data.max())#最大值 print(data.sum())#總和 print(data.std())#標準差 print(data.median())#中位數 print(data.nlargest(3))#最大的三個數 ``` #### Series字串操作:<span style = "color:green">str.lower()、str.upper()、str.len()、str.cat(sep = "n")用n串在一起、str.contains("n")判斷每一個字串是否包含n、str.replace("n","k")對字串做取代的動作,把n變成k</span> ```python= import pandas as pd data = pd.Series(['taiwan','台灣','python']) print(data.str.lower())#全部變小寫 print(data.str.len())#算出每個字串長度 print(data.str.cat(sep = ','))#將字串串在一起,中間以,相接 print(data.str.contains('t'))#判斷字串是否包含t print(data.str.replace('台灣','中華民國'))#將台灣換成中華民國 ``` ### DataFrame <span style = "color:red">**`列row行column(直行橫列)`**</span> basis form:DataFrame(dict,index) ```python= import pandas as pd data = pd.DataFrame({'name':['max','tom','tank'],'salary':[30000,50000,120000]}) print(data) print('==========') print(data['name'])#取得行 print('==========') print(data['salary'])#取得行 print('==========') print(data.iloc[1])#取得列 ``` #### 資料索引 ```python= import pandas as pd data = pd.DataFrame({'name':['max','tom','steven'],'salary':[30000,80000,120000]},index = ['a','b','c']) print(data) ``` #### 觀察資料 ```python= import pandas as pd data = pd.DataFrame({'name':['max','tom','steven'],'salary':[30000,80000,120000]},index = ['a','b','c']) print(data.size)#資料數量 print(data.shape)#資料形狀 print(data.index)#資料索引 ``` <span style = "color:red">雙維度的資料取得特定的欄或列即可成為單維度的資料即可結合Series一起用</span> #### 取得列(row) ```python= import pandas as pd data = pd.DataFrame({'name':['max','tom','steven'],'salary':[30000,80000,120000]},index = ['a','b','c']) print(data.iloc[1])#按照順序取得第2列 print(data.loc['c'])#按照索引取得第3列 ``` #### 取得行(欄)(column) ```python= import pandas as pd data = pd.DataFrame({'name':['max','tom','steven'],'salary':[30000,80000,120000]},index = ['a','b','c']) print(data['name']) new_data = data['name'] print(new_data.str.upper()) salary_aver = data['salary'] print(salary_aver.mean()) ``` #### 建立新的欄位 ```python= import pandas as pd data = pd.DataFrame({'name':['max','tom','steven'],'salary':[30000,80000,120000]},index = ['a','b','c']) data['profit'] = [10000,20000,30000]#data[新欄位的名稱] = 列表 data['rank'] = pd.Series([3,6,1],index = ['a','b','c'])#data[新欄位的名稱] = Series的資料 data['cp'] = data['profit']/data['salary']#運用舊欄位產生出新欄位 print(data) ``` ### 資料篩選 #### Series #### 數字 ```python= import pandas as pd data = pd.Series([30,15,20]) condition = data>18 print(condition) newData = data[condition] print(newData) ``` 做出一個比較運算,藉由True/False來判斷此資料是否符合condition。 #### 字串 <span style = "color:red">str.contains</span> ```python= import pandas as pd data = pd.Series(["你好","python","pandas"]) condition = data.str.contains('p') newdata = data[condition] print(newdata) ``` #### DataFrame ```python= import pandas as pd data = pd.DataFrame({'name':['max','tom','ken'],'salary':[30000,50000,40000]}) condition = data['salary'] > 30000 print(condition) newdata = data[condition] print(newdata) ``` ```python= import pandas as pd data = pd.DataFrame({'name':['max','tom','ken'],'salary':[30000,50000,40000]}) condition = data['name'] == 'max' print(condition) newdata = data[condition] print(newdata) ```