--- pandas --- ```python import psycopg2 import pandas as pd conn = psycopg2.connect(database="mydb", user="test01", password="testpw", host="localhost", port="5432") print(conn.closed) ## 允許python執行postgresql語法 with conn.cursor() as cur: ## 觀察查詢筆數 print(cur.rowcount) ## SQL語法 sql = "select * from public.salary" ## 執行sql語法 cur.execute(sql) ## 取得欄位名稱 name = [desc[0] for desc in cur.description] ## fetchall是將所有收尋的資料都寫入 rows 中 # rows = pd.DataFrame(cur.fetchall()) rows = pd.DataFrame(cur.fetchall(),columns=name) ## 取出一筆資料 # rows = pd.DataFrame(cur.fetchone()) ## 自行設定要幾筆資料 # rows = pd.DataFrame(cur.fetchmany(10)) ## 觀察查詢筆數 print(cur.rowcount) print(rows.shape) rows.head() ``` 0 -1 800 (800, 9) <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>check_n</th> <th>id</th> <th>date</th> <th>clock_in</th> <th>clock_out</th> <th>salary_1</th> <th>salary_2</th> <th>salary_3</th> <th>salary_4</th> </tr> </thead> <tbody> <tr> <td>0</td> <td>1</td> <td>3686</td> <td>2021-06-26</td> <td>7:22:30</td> <td>22:3:39</td> <td>37189781436</td> <td>21040014451</td> <td>64716969172</td> <td>33480505957</td> </tr> <tr> <td>1</td> <td>2</td> <td>3686</td> <td>2021-06-10</td> <td>6:27:32</td> <td>17:43:58</td> <td>9572746424</td> <td>30750054304</td> <td>94795101611</td> <td>26989956772</td> </tr> <tr> <td>2</td> <td>3</td> <td>3686</td> <td>2021-06-26</td> <td>10:23:31</td> <td>19:54:49</td> <td>22131633201</td> <td>24360486310</td> <td>38951440787</td> <td>50535635106</td> </tr> <tr> <td>3</td> <td>4</td> <td>3686</td> <td>2021-06-06</td> <td>9:25:13</td> <td>17:57:49</td> <td>57455281008</td> <td>13692157108</td> <td>28280930472</td> <td>16803796965</td> </tr> <tr> <td>4</td> <td>5</td> <td>3686</td> <td>2021-06-20</td> <td>10:17:18</td> <td>22:21:48</td> <td>83254690126</td> <td>65883681703</td> <td>3426866555</td> <td>38369031374</td> </tr> </tbody> </table> </div> Pandas 主要是讀取、轉換和處理資料,使用表格的方式呈現給使用者觀看。Series與DataFrame是主要的資料結構,分成一維與二維資料。 ```python dfs = pd.read_html('https://www.ubus.com.tw/Booking/FareInquiry') dfs[0] ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead tr th { text-align: left; } </style> <table border="1" class="dataframe"> <thead> <tr> <th></th> <th colspan="6" halign="left">國道路線</th> </tr> <tr> <th></th> <th>路線名稱</th> <th colspan="2" halign="left">全票 / 優惠全票 票價</th> <th>半票票價</th> <th>軍優票價</th> <th>去回票價</th> </tr> <tr> <th></th> <th>路線名稱</th> <th>優惠時段</th> <th>原價時段</th> <th>半票票價</th> <th>軍優票價</th> <th>去回票價</th> </tr> </thead> <tbody> <tr> <td>0</td> <td>【1610】台北-高雄</td> <td>535元</td> <td>590元</td> <td>290元</td> <td>560元</td> <td>1,120元</td> </tr> <tr> <td>1</td> <td>【1611】台北-台南</td> <td>430元</td> <td>480元</td> <td>265元</td> <td>-元</td> <td>930元</td> </tr> <tr> <td>2</td> <td>【1612】台北-北二高-台南</td> <td>430元</td> <td>480元</td> <td>265元</td> <td>-元</td> <td>930元</td> </tr> <tr> <td>3</td> <td>【1612】台北-北二高-北港路</td> <td>400元</td> <td>420元</td> <td>220元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>4</td> <td>【1613】台北-屏東</td> <td>560元</td> <td>620元</td> <td>325元</td> <td>600元</td> <td>1,200元</td> </tr> <tr> <td>5</td> <td>【1615】台北-彰化</td> <td>290元</td> <td>320元</td> <td>165元</td> <td>315元</td> <td>630元</td> </tr> <tr> <td>6</td> <td>【1616】台北-員林</td> <td>320元</td> <td>350元</td> <td>175元</td> <td>345元</td> <td>690元</td> </tr> <tr> <td>7</td> <td>【1617】台北-東勢</td> <td>300元</td> <td>330元</td> <td>165元</td> <td>320元</td> <td>640元</td> </tr> <tr> <td>8</td> <td>【1617】台北-豐原</td> <td>270元</td> <td>285元</td> <td>140元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>9</td> <td>【1618】台北-嘉義</td> <td>400元</td> <td>440元</td> <td>220元</td> <td>-元</td> <td>860元</td> </tr> <tr> <td>10</td> <td>【1619】台北-中港路-台中</td> <td>270元</td> <td>300元</td> <td>160元</td> <td>285元</td> <td>570元</td> </tr> <tr> <td>11</td> <td>【1619】台北-朝馬</td> <td>240元</td> <td>260元</td> <td>140元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>12</td> <td>【1620】台北-中清路-台中</td> <td>270元</td> <td>300元</td> <td>160元</td> <td>285元</td> <td>570元</td> </tr> <tr> <td>13</td> <td>【1620】台北-水湳</td> <td>260元</td> <td>275元</td> <td>135元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>14</td> <td>【1621】台中-高雄</td> <td>320元</td> <td>350元</td> <td>160元</td> <td>330元</td> <td>660元</td> </tr> <tr> <td>15</td> <td>【1621】彰化交流道-高雄</td> <td>300元</td> <td>310元</td> <td>140元</td> <td>300元</td> <td>600元</td> </tr> <tr> <td>16</td> <td>【1623】台中市-中正機場</td> <td>300元</td> <td>300元</td> <td>170元</td> <td>-元</td> <td>560元</td> </tr> <tr> <td>17</td> <td>【1625】台中市-台南市</td> <td>250元</td> <td>270元</td> <td>150元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>18</td> <td>【1625】朝馬-台南市</td> <td>210元</td> <td>230元</td> <td>135元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>19</td> <td>【1627】中壢服務區-中正機場</td> <td>45元</td> <td>45元</td> <td>20元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>20</td> <td>【1628】台北-麻豆佳里漚汪</td> <td>430元</td> <td>480元</td> <td>265元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>21</td> <td>【1629】台北-學甲-苓子寮</td> <td>430元</td> <td>480元</td> <td>265元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>22</td> <td>【1630】台北-西港</td> <td>460元</td> <td>460元</td> <td>220元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>23</td> <td>【1630】台北-二林</td> <td>420元</td> <td>420元</td> <td>210元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>24</td> <td>【1630】台北-溪湖</td> <td>360元</td> <td>360元</td> <td>180元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>25</td> <td>【1631】台北-二水-竹山</td> <td>480元</td> <td>480元</td> <td>240元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>26</td> <td>【1631】台北-二水田中社頭</td> <td>380元</td> <td>380元</td> <td>185元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>27</td> <td>【1632】台北-草屯-竹山</td> <td>480元</td> <td>480元</td> <td>240元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>28</td> <td>【1632】台北-草屯南投</td> <td>360元</td> <td>380元</td> <td>185元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>29</td> <td>【1633】台北-北港-三條崙</td> <td>470元</td> <td>470元</td> <td>250元</td> <td>-元</td> <td>900元</td> </tr> <tr> <td>30</td> <td>【1635】台北-虎尾-台西三條崙</td> <td>470元</td> <td>470元</td> <td>250元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>31</td> <td>【1635】台北-虎尾</td> <td>420元</td> <td>420元</td> <td>190元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>32</td> <td>【1636】台北-西螺四湖三條崙</td> <td>470元</td> <td>470元</td> <td>250元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>33</td> <td>【1636】台北-西螺</td> <td>375元</td> <td>375元</td> <td>190元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>34</td> <td>【1637】台北-西螺林厝寮三條崙</td> <td>470元</td> <td>470元</td> <td>250元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>35</td> <td>【1638】台北-朴子-東石</td> <td>480元</td> <td>480元</td> <td>265元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>36</td> <td>【1639】台北-布袋</td> <td>480元</td> <td>480元</td> <td>265元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>37</td> <td>【1650】高雄-北港口湖三條崙</td> <td>275元</td> <td>275元</td> <td>135元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>38</td> <td>【1650】高雄-麻豆</td> <td>115元</td> <td>115元</td> <td>55元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>39</td> <td>【1650】高雄-北港路</td> <td>175元</td> <td>175元</td> <td>90元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>40</td> <td>【1651】高雄-北港四湖三條崙</td> <td>275元</td> <td>275元</td> <td>135元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>41</td> <td>【1652】台北-鹿港-芳苑</td> <td>395元</td> <td>395元</td> <td>230元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>42</td> <td>【1656】板橋-二高-屏東</td> <td>550元</td> <td>600元</td> <td>290元</td> <td>590元</td> <td>1,180元</td> </tr> <tr> <td>43</td> <td>【1657】南投-高鐵烏日站線</td> <td>85元</td> <td>85元</td> <td>40元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>44</td> <td>【1657】中興國中-高鐵烏日站線</td> <td>70元</td> <td>70元</td> <td>35元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>45</td> <td>【1657】草屯-高鐵烏日站線</td> <td>55元</td> <td>55元</td> <td>25元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>46</td> <td>【1660】板橋-高雄</td> <td>510元</td> <td>570元</td> <td>290元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>47</td> <td>【1661】礁溪轉運站-桃園機場</td> <td>200元</td> <td>200元</td> <td>100元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>48</td> <td>【1661】宜蘭轉運站-桃園機場</td> <td>250元</td> <td>250元</td> <td>125元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>49</td> <td>【1661】羅東轉運站-桃園機場</td> <td>280元</td> <td>280元</td> <td>140元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>50</td> <td>【9019】朝馬-溪湖</td> <td>90元</td> <td>90元</td> <td>45元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>51</td> <td>【9019】朝馬-二林</td> <td>110元</td> <td>110元</td> <td>55元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>52</td> <td>【9019】朝馬-大城</td> <td>140元</td> <td>140元</td> <td>70元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>53</td> <td>【9019】朝馬-麥寮</td> <td>190元</td> <td>190元</td> <td>95元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>54</td> <td>【1663】南港-花蓮</td> <td>290元</td> <td>320元</td> <td>205元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>55</td> <td>【1663】南港-北埔</td> <td>290元</td> <td>320元</td> <td>195元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>56</td> <td>【1663】南港-新城</td> <td>260元</td> <td>290元</td> <td>175元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>57</td> <td>【1650】高雄-北港</td> <td>220元</td> <td>220元</td> <td>110元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>58</td> <td>【1651】高雄-北港</td> <td>220元</td> <td>220元</td> <td>110元</td> <td>-元</td> <td>-元</td> </tr> <tr> <td>59</td> <td>【1651】高雄-北港路</td> <td>175元</td> <td>175元</td> <td>90元</td> <td>-元</td> <td>-元</td> </tr> </tbody> </table> </div> ```python rows.tail(3) ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>check_n</th> <th>id</th> <th>date</th> <th>clock_in</th> <th>clock_out</th> <th>salary_1</th> <th>salary_2</th> <th>salary_3</th> <th>salary_4</th> </tr> </thead> <tbody> <tr> <td>797</td> <td>798</td> <td>8771</td> <td>2021-06-13</td> <td>7:28:37</td> <td>18:12:30</td> <td>11292160995</td> <td>92530661221</td> <td>35926946256</td> <td>18157341512</td> </tr> <tr> <td>798</td> <td>799</td> <td>8771</td> <td>2021-06-10</td> <td>8:50:55</td> <td>22:2:14</td> <td>87552010075</td> <td>38000894493</td> <td>18831773874</td> <td>39658641024</td> </tr> <tr> <td>799</td> <td>800</td> <td>8771</td> <td>2021-06-23</td> <td>8:8:13</td> <td>19:3:16</td> <td>45356737851</td> <td>90791781951</td> <td>57742913264</td> <td>74593282041</td> </tr> </tbody> </table> </div> ```python rows.index ``` RangeIndex(start=0, stop=800, step=1) ```python rows.columns ``` Index(['check_n', 'id', 'date', 'clock_in', 'clock_out', 'salary_1', 'salary_2', 'salary_3', 'salary_4'], dtype='object') ```python rows.to_numpy() ``` array([[1, '3686', datetime.date(2021, 6, 26), ..., '21040014451', '64716969172', '33480505957'], [2, '3686', datetime.date(2021, 6, 10), ..., '30750054304', '94795101611', '26989956772'], [3, '3686', datetime.date(2021, 6, 26), ..., '24360486310', '38951440787', '50535635106'], ..., [798, '8771', datetime.date(2021, 6, 13), ..., '92530661221', '35926946256', '18157341512'], [799, '8771', datetime.date(2021, 6, 10), ..., '38000894493', '18831773874', '39658641024'], [800, '8771', datetime.date(2021, 6, 23), ..., '90791781951', '57742913264', '74593282041']], dtype=object) ```python rows.to_numpy()[0] ``` array([1, '3686', datetime.date(2021, 6, 26), '7:22:30', '22:3:39', '37189781436', '21040014451', '64716969172', '33480505957'], dtype=object) ```python rows.describe() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>check_n</th> </tr> </thead> <tbody> <tr> <td>count</td> <td>800.0000</td> </tr> <tr> <td>mean</td> <td>400.5000</td> </tr> <tr> <td>std</td> <td>231.0844</td> </tr> <tr> <td>min</td> <td>1.0000</td> </tr> <tr> <td>25%</td> <td>200.7500</td> </tr> <tr> <td>50%</td> <td>400.5000</td> </tr> <tr> <td>75%</td> <td>600.2500</td> </tr> <tr> <td>max</td> <td>800.0000</td> </tr> </tbody> </table> </div> ```python rows["salary_1"].describe() rows.salary_1.describe() ``` count 800 unique 800 top 93368241982 freq 1 Name: salary_1, dtype: object ```python rows.describe(include='all') ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>check_n</th> <th>id</th> <th>date</th> <th>clock_in</th> <th>clock_out</th> <th>salary_1</th> <th>salary_2</th> <th>salary_3</th> <th>salary_4</th> </tr> </thead> <tbody> <tr> <td>count</td> <td>800.0000</td> <td>800</td> <td>800</td> <td>800</td> <td>800</td> <td>800</td> <td>800</td> <td>800</td> <td>800</td> </tr> <tr> <td>unique</td> <td>NaN</td> <td>20</td> <td>30</td> <td>791</td> <td>785</td> <td>800</td> <td>800</td> <td>800</td> <td>800</td> </tr> <tr> <td>top</td> <td>NaN</td> <td>6479</td> <td>2021-06-22</td> <td>10:23:31</td> <td>22:22:28</td> <td>93368241982</td> <td>69597211664</td> <td>23466241348</td> <td>71306233852</td> </tr> <tr> <td>freq</td> <td>NaN</td> <td>40</td> <td>37</td> <td>2</td> <td>2</td> <td>1</td> <td>1</td> <td>1</td> <td>1</td> </tr> <tr> <td>mean</td> <td>400.5000</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> </tr> <tr> <td>std</td> <td>231.0844</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> </tr> <tr> <td>min</td> <td>1.0000</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> </tr> <tr> <td>25%</td> <td>200.7500</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> </tr> <tr> <td>50%</td> <td>400.5000</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> </tr> <tr> <td>75%</td> <td>600.2500</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> </tr> <tr> <td>max</td> <td>800.0000</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> <td>NaN</td> </tr> </tbody> </table> </div> ## 因為裡面只有check_n在設計時,是屬於數值資料,其他都是類別資料或者時間資料 ```python rows.info() ``` <class 'pandas.core.frame.DataFrame'> RangeIndex: 800 entries, 0 to 799 Data columns (total 9 columns): check_n 800 non-null int64 id 800 non-null object date 800 non-null object clock_in 800 non-null object clock_out 800 non-null object salary_1 800 non-null object salary_2 800 non-null object salary_3 800 non-null object salary_4 800 non-null object dtypes: int64(1), object(8) memory usage: 56.4+ KB ```python pd.to_numeric(rows.salary_1) ``` 0 37189781436 1 9572746424 2 22131633201 3 57455281008 4 83254690126 ... 795 63630615608 796 61733968697 797 11292160995 798 87552010075 799 45356737851 Name: salary_1, Length: 800, dtype: int64 ```python rows.salary_1 ``` 0 37189781436 1 9572746424 2 22131633201 3 57455281008 4 83254690126 ... 795 63630615608 796 61733968697 797 11292160995 798 87552010075 799 45356737851 Name: salary_1, Length: 800, dtype: object ```python rows1 = rows rows1["salary_1_num"] = pd.to_numeric(rows.salary_1) rows1.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>check_n</th> <th>id</th> <th>date</th> <th>clock_in</th> <th>clock_out</th> <th>salary_1</th> <th>salary_2</th> <th>salary_3</th> <th>salary_4</th> <th>salary_1_num</th> </tr> </thead> <tbody> <tr> <td>0</td> <td>1</td> <td>3686</td> <td>2021-06-26</td> <td>7:22:30</td> <td>22:3:39</td> <td>37189781436</td> <td>21040014451</td> <td>64716969172</td> <td>33480505957</td> <td>37189781436</td> </tr> <tr> <td>1</td> <td>2</td> <td>3686</td> <td>2021-06-10</td> <td>6:27:32</td> <td>17:43:58</td> <td>9572746424</td> <td>30750054304</td> <td>94795101611</td> <td>26989956772</td> <td>9572746424</td> </tr> <tr> <td>2</td> <td>3</td> <td>3686</td> <td>2021-06-26</td> <td>10:23:31</td> <td>19:54:49</td> <td>22131633201</td> <td>24360486310</td> <td>38951440787</td> <td>50535635106</td> <td>22131633201</td> </tr> <tr> <td>3</td> <td>4</td> <td>3686</td> <td>2021-06-06</td> <td>9:25:13</td> <td>17:57:49</td> <td>57455281008</td> <td>13692157108</td> <td>28280930472</td> <td>16803796965</td> <td>57455281008</td> </tr> <tr> <td>4</td> <td>5</td> <td>3686</td> <td>2021-06-20</td> <td>10:17:18</td> <td>22:21:48</td> <td>83254690126</td> <td>65883681703</td> <td>3426866555</td> <td>38369031374</td> <td>83254690126</td> </tr> </tbody> </table> </div> ```python rows1.info() ``` <class 'pandas.core.frame.DataFrame'> RangeIndex: 800 entries, 0 to 799 Data columns (total 10 columns): check_n 800 non-null int64 id 800 non-null object date 800 non-null object clock_in 800 non-null object clock_out 800 non-null object salary_1 800 non-null object salary_2 800 non-null object salary_3 800 non-null object salary_4 800 non-null object salary_1_num 800 non-null int64 dtypes: int64(2), object(8) memory usage: 62.6+ KB ```python rows.describe() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>check_n</th> <th>salary_1_num</th> </tr> </thead> <tbody> <tr> <td>count</td> <td>800.0000</td> <td>8.000000e+02</td> </tr> <tr> <td>mean</td> <td>400.5000</td> <td>4.909562e+10</td> </tr> <tr> <td>std</td> <td>231.0844</td> <td>2.908800e+10</td> </tr> <tr> <td>min</td> <td>1.0000</td> <td>2.177264e+08</td> </tr> <tr> <td>25%</td> <td>200.7500</td> <td>2.362071e+10</td> </tr> <tr> <td>50%</td> <td>400.5000</td> <td>4.941501e+10</td> </tr> <tr> <td>75%</td> <td>600.2500</td> <td>7.390700e+10</td> </tr> <tr> <td>max</td> <td>800.0000</td> <td>9.989334e+10</td> </tr> </tbody> </table> </div> ```python ## 轉換時間資料 rows.clock_in[0] ``` '7:22:30' ```python d = pd.Timestamp(rows.clock_in[0], unit='s') d ``` Timestamp('2021-07-29 07:22:30') ```python print(d.time()) ``` 07:22:30 ```python ## 功課,修改格式 pd.Timestamp(rows.clock_in[10], unit='s') rows.clock_in[10] ``` --------------------------------------------------------------------------- OutOfBoundsDatetime Traceback (most recent call last) <ipython-input-20-16f85ae74d68> in <module> 1 ## 功課,修改格式 ----> 2 pd.Timestamp(rows.clock_in[10], unit='s') 3 rows.clock_in[10] pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__() pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject() pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_str_to_tsobject() pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject() pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject() pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds() OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 07:02:59 ```python rows.clock_in[10] ``` ```python ## 字串切割 rows.clock_in.str.split(r"\:") ``` ```python ## 列出一行,切割訊號 r = [rows.clock_in.str.split(r"\:")][0][10] print(r) for n in range(3): if len(r[n]) <2: r[n] = str(0)+r[n] print(r) ':'.join(r) ``` ```python pd.Timestamp(':'.join(r)) ``` ```python rows1.iloc[0] ``` ```python ## 計算最大值 rows1.columns ``` ```python #5之後,9以前,不包含9 rows1.columns[5:9] ``` ```python ## 取出欄位資料 rows1[rows1.columns[5:9]] ``` ```python ## 刪除資料 欄位 rows1[rows1.columns[5:9]].drop("salary_1",axis=1) ``` ```python ## 刪除資料 列位 rows1[rows1.columns[5:9]].drop(0,axis=0) ``` ```python ## axis : 0 is by row, 1 by is column rows1[rows1.columns[5:9]].max(axis=1) ``` ```python rows1.head() ``` ```python rows1.to_csv("new.csv", header=False,index = False) ``` ```python ## 存入資料庫 conn = psycopg2.connect(database="mydb", user="test01", password="testpw", host="localhost", port="5432") print(conn.closed) ## 允許python執行postgresql語法 with conn.cursor() as cur: cur.execute("DROP TABLE IF EXISTS new_salary") cur.execute("CREATE TABLE new_salary( \ check_n SERIAL PRIMARY KEY, \ id SERIAL, \ date DATE NOT NULL, \ clock_in VARCHAR (10) NOT NULL, \ clock_out VARCHAR (10) NOT NULL, \ salary_1 VARCHAR (11) NOT NULL, \ salary_2 VARCHAR (11) NOT NULL, \ salary_3 VARCHAR (11) NOT NULL, \ salary_4 VARCHAR (11) NOT NULL, \ salary_1_num BIGINT NOT NULL)") f = open("./new.csv", 'r') cur.copy_from(f, "new_salary", columns=rows1.columns, sep=",") print("end") ``` ```python print(conn.closed) ## 允許python執行postgresql語法 with conn.cursor() as cur: ## 觀察查詢筆數 print(cur.rowcount) ## SQL語法 sql = "select * from new_salary" ## 執行sql語法 cur.execute(sql) ## 取得欄位名稱 name = [desc[0] for desc in cur.description] ## fetchall是將所有收尋的資料都寫入 rows 中 # rows = pd.DataFrame(cur.fetchall()) rows2 = pd.DataFrame(cur.fetchall(),columns=name) ## 取出一筆資料 # rows = pd.DataFrame(cur.fetchone()) ## 自行設定要幾筆資料 # rows = pd.DataFrame(cur.fetchmany(10)) ## 觀察查詢筆數 print(cur.rowcount) print(rows2.shape) rows2.head() ``` # 功課 ## 一、輸出完整時間格式,移除不尋常的時間,取代表內格式 ```python ``` ## 二、將薪水1~4改成數值資料,並且計算出最大值,寫到新的欄位 ```python ``` ## 三、以上完成,將新表格存入psql資料庫 ** 正確的datatypes設定 ### 表格內容:check_n, id, date, clock_in, clock_out, maxsalary #### 這部分沒有完成下禮拜作業會無法完成 ```python ```