---
pandas
---
```python
import psycopg2
import pandas as pd
conn = psycopg2.connect(database="mydb", user="test01",
password="testpw", host="localhost",
port="5432")
print(conn.closed)
## 允許python執行postgresql語法
with conn.cursor() as cur:
## 觀察查詢筆數
print(cur.rowcount)
## SQL語法
sql = "select * from public.salary"
## 執行sql語法
cur.execute(sql)
## 取得欄位名稱
name = [desc[0] for desc in cur.description]
## fetchall是將所有收尋的資料都寫入 rows 中
# rows = pd.DataFrame(cur.fetchall())
rows = pd.DataFrame(cur.fetchall(),columns=name)
## 取出一筆資料
# rows = pd.DataFrame(cur.fetchone())
## 自行設定要幾筆資料
# rows = pd.DataFrame(cur.fetchmany(10))
## 觀察查詢筆數
print(cur.rowcount)
print(rows.shape)
rows.head()
```
0
-1
800
(800, 9)
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>check_n</th>
<th>id</th>
<th>date</th>
<th>clock_in</th>
<th>clock_out</th>
<th>salary_1</th>
<th>salary_2</th>
<th>salary_3</th>
<th>salary_4</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>3686</td>
<td>2021-06-26</td>
<td>7:22:30</td>
<td>22:3:39</td>
<td>37189781436</td>
<td>21040014451</td>
<td>64716969172</td>
<td>33480505957</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>3686</td>
<td>2021-06-10</td>
<td>6:27:32</td>
<td>17:43:58</td>
<td>9572746424</td>
<td>30750054304</td>
<td>94795101611</td>
<td>26989956772</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>3686</td>
<td>2021-06-26</td>
<td>10:23:31</td>
<td>19:54:49</td>
<td>22131633201</td>
<td>24360486310</td>
<td>38951440787</td>
<td>50535635106</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>3686</td>
<td>2021-06-06</td>
<td>9:25:13</td>
<td>17:57:49</td>
<td>57455281008</td>
<td>13692157108</td>
<td>28280930472</td>
<td>16803796965</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>3686</td>
<td>2021-06-20</td>
<td>10:17:18</td>
<td>22:21:48</td>
<td>83254690126</td>
<td>65883681703</td>
<td>3426866555</td>
<td>38369031374</td>
</tr>
</tbody>
</table>
</div>
Pandas 主要是讀取、轉換和處理資料,使用表格的方式呈現給使用者觀看。Series與DataFrame是主要的資料結構,分成一維與二維資料。
```python
dfs = pd.read_html('https://www.ubus.com.tw/Booking/FareInquiry')
dfs[0]
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr>
<th></th>
<th colspan="6" halign="left">國道路線</th>
</tr>
<tr>
<th></th>
<th>路線名稱</th>
<th colspan="2" halign="left">全票 / 優惠全票 票價</th>
<th>半票票價</th>
<th>軍優票價</th>
<th>去回票價</th>
</tr>
<tr>
<th></th>
<th>路線名稱</th>
<th>優惠時段</th>
<th>原價時段</th>
<th>半票票價</th>
<th>軍優票價</th>
<th>去回票價</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>【1610】台北-高雄</td>
<td>535元</td>
<td>590元</td>
<td>290元</td>
<td>560元</td>
<td>1,120元</td>
</tr>
<tr>
<td>1</td>
<td>【1611】台北-台南</td>
<td>430元</td>
<td>480元</td>
<td>265元</td>
<td>-元</td>
<td>930元</td>
</tr>
<tr>
<td>2</td>
<td>【1612】台北-北二高-台南</td>
<td>430元</td>
<td>480元</td>
<td>265元</td>
<td>-元</td>
<td>930元</td>
</tr>
<tr>
<td>3</td>
<td>【1612】台北-北二高-北港路</td>
<td>400元</td>
<td>420元</td>
<td>220元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>4</td>
<td>【1613】台北-屏東</td>
<td>560元</td>
<td>620元</td>
<td>325元</td>
<td>600元</td>
<td>1,200元</td>
</tr>
<tr>
<td>5</td>
<td>【1615】台北-彰化</td>
<td>290元</td>
<td>320元</td>
<td>165元</td>
<td>315元</td>
<td>630元</td>
</tr>
<tr>
<td>6</td>
<td>【1616】台北-員林</td>
<td>320元</td>
<td>350元</td>
<td>175元</td>
<td>345元</td>
<td>690元</td>
</tr>
<tr>
<td>7</td>
<td>【1617】台北-東勢</td>
<td>300元</td>
<td>330元</td>
<td>165元</td>
<td>320元</td>
<td>640元</td>
</tr>
<tr>
<td>8</td>
<td>【1617】台北-豐原</td>
<td>270元</td>
<td>285元</td>
<td>140元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>9</td>
<td>【1618】台北-嘉義</td>
<td>400元</td>
<td>440元</td>
<td>220元</td>
<td>-元</td>
<td>860元</td>
</tr>
<tr>
<td>10</td>
<td>【1619】台北-中港路-台中</td>
<td>270元</td>
<td>300元</td>
<td>160元</td>
<td>285元</td>
<td>570元</td>
</tr>
<tr>
<td>11</td>
<td>【1619】台北-朝馬</td>
<td>240元</td>
<td>260元</td>
<td>140元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>12</td>
<td>【1620】台北-中清路-台中</td>
<td>270元</td>
<td>300元</td>
<td>160元</td>
<td>285元</td>
<td>570元</td>
</tr>
<tr>
<td>13</td>
<td>【1620】台北-水湳</td>
<td>260元</td>
<td>275元</td>
<td>135元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>14</td>
<td>【1621】台中-高雄</td>
<td>320元</td>
<td>350元</td>
<td>160元</td>
<td>330元</td>
<td>660元</td>
</tr>
<tr>
<td>15</td>
<td>【1621】彰化交流道-高雄</td>
<td>300元</td>
<td>310元</td>
<td>140元</td>
<td>300元</td>
<td>600元</td>
</tr>
<tr>
<td>16</td>
<td>【1623】台中市-中正機場</td>
<td>300元</td>
<td>300元</td>
<td>170元</td>
<td>-元</td>
<td>560元</td>
</tr>
<tr>
<td>17</td>
<td>【1625】台中市-台南市</td>
<td>250元</td>
<td>270元</td>
<td>150元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>18</td>
<td>【1625】朝馬-台南市</td>
<td>210元</td>
<td>230元</td>
<td>135元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>19</td>
<td>【1627】中壢服務區-中正機場</td>
<td>45元</td>
<td>45元</td>
<td>20元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>20</td>
<td>【1628】台北-麻豆佳里漚汪</td>
<td>430元</td>
<td>480元</td>
<td>265元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>21</td>
<td>【1629】台北-學甲-苓子寮</td>
<td>430元</td>
<td>480元</td>
<td>265元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>22</td>
<td>【1630】台北-西港</td>
<td>460元</td>
<td>460元</td>
<td>220元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>23</td>
<td>【1630】台北-二林</td>
<td>420元</td>
<td>420元</td>
<td>210元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>24</td>
<td>【1630】台北-溪湖</td>
<td>360元</td>
<td>360元</td>
<td>180元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>25</td>
<td>【1631】台北-二水-竹山</td>
<td>480元</td>
<td>480元</td>
<td>240元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>26</td>
<td>【1631】台北-二水田中社頭</td>
<td>380元</td>
<td>380元</td>
<td>185元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>27</td>
<td>【1632】台北-草屯-竹山</td>
<td>480元</td>
<td>480元</td>
<td>240元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>28</td>
<td>【1632】台北-草屯南投</td>
<td>360元</td>
<td>380元</td>
<td>185元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>29</td>
<td>【1633】台北-北港-三條崙</td>
<td>470元</td>
<td>470元</td>
<td>250元</td>
<td>-元</td>
<td>900元</td>
</tr>
<tr>
<td>30</td>
<td>【1635】台北-虎尾-台西三條崙</td>
<td>470元</td>
<td>470元</td>
<td>250元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>31</td>
<td>【1635】台北-虎尾</td>
<td>420元</td>
<td>420元</td>
<td>190元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>32</td>
<td>【1636】台北-西螺四湖三條崙</td>
<td>470元</td>
<td>470元</td>
<td>250元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>33</td>
<td>【1636】台北-西螺</td>
<td>375元</td>
<td>375元</td>
<td>190元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>34</td>
<td>【1637】台北-西螺林厝寮三條崙</td>
<td>470元</td>
<td>470元</td>
<td>250元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>35</td>
<td>【1638】台北-朴子-東石</td>
<td>480元</td>
<td>480元</td>
<td>265元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>36</td>
<td>【1639】台北-布袋</td>
<td>480元</td>
<td>480元</td>
<td>265元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>37</td>
<td>【1650】高雄-北港口湖三條崙</td>
<td>275元</td>
<td>275元</td>
<td>135元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>38</td>
<td>【1650】高雄-麻豆</td>
<td>115元</td>
<td>115元</td>
<td>55元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>39</td>
<td>【1650】高雄-北港路</td>
<td>175元</td>
<td>175元</td>
<td>90元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>40</td>
<td>【1651】高雄-北港四湖三條崙</td>
<td>275元</td>
<td>275元</td>
<td>135元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>41</td>
<td>【1652】台北-鹿港-芳苑</td>
<td>395元</td>
<td>395元</td>
<td>230元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>42</td>
<td>【1656】板橋-二高-屏東</td>
<td>550元</td>
<td>600元</td>
<td>290元</td>
<td>590元</td>
<td>1,180元</td>
</tr>
<tr>
<td>43</td>
<td>【1657】南投-高鐵烏日站線</td>
<td>85元</td>
<td>85元</td>
<td>40元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>44</td>
<td>【1657】中興國中-高鐵烏日站線</td>
<td>70元</td>
<td>70元</td>
<td>35元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>45</td>
<td>【1657】草屯-高鐵烏日站線</td>
<td>55元</td>
<td>55元</td>
<td>25元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>46</td>
<td>【1660】板橋-高雄</td>
<td>510元</td>
<td>570元</td>
<td>290元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>47</td>
<td>【1661】礁溪轉運站-桃園機場</td>
<td>200元</td>
<td>200元</td>
<td>100元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>48</td>
<td>【1661】宜蘭轉運站-桃園機場</td>
<td>250元</td>
<td>250元</td>
<td>125元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>49</td>
<td>【1661】羅東轉運站-桃園機場</td>
<td>280元</td>
<td>280元</td>
<td>140元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>50</td>
<td>【9019】朝馬-溪湖</td>
<td>90元</td>
<td>90元</td>
<td>45元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>51</td>
<td>【9019】朝馬-二林</td>
<td>110元</td>
<td>110元</td>
<td>55元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>52</td>
<td>【9019】朝馬-大城</td>
<td>140元</td>
<td>140元</td>
<td>70元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>53</td>
<td>【9019】朝馬-麥寮</td>
<td>190元</td>
<td>190元</td>
<td>95元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>54</td>
<td>【1663】南港-花蓮</td>
<td>290元</td>
<td>320元</td>
<td>205元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>55</td>
<td>【1663】南港-北埔</td>
<td>290元</td>
<td>320元</td>
<td>195元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>56</td>
<td>【1663】南港-新城</td>
<td>260元</td>
<td>290元</td>
<td>175元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>57</td>
<td>【1650】高雄-北港</td>
<td>220元</td>
<td>220元</td>
<td>110元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>58</td>
<td>【1651】高雄-北港</td>
<td>220元</td>
<td>220元</td>
<td>110元</td>
<td>-元</td>
<td>-元</td>
</tr>
<tr>
<td>59</td>
<td>【1651】高雄-北港路</td>
<td>175元</td>
<td>175元</td>
<td>90元</td>
<td>-元</td>
<td>-元</td>
</tr>
</tbody>
</table>
</div>
```python
rows.tail(3)
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>check_n</th>
<th>id</th>
<th>date</th>
<th>clock_in</th>
<th>clock_out</th>
<th>salary_1</th>
<th>salary_2</th>
<th>salary_3</th>
<th>salary_4</th>
</tr>
</thead>
<tbody>
<tr>
<td>797</td>
<td>798</td>
<td>8771</td>
<td>2021-06-13</td>
<td>7:28:37</td>
<td>18:12:30</td>
<td>11292160995</td>
<td>92530661221</td>
<td>35926946256</td>
<td>18157341512</td>
</tr>
<tr>
<td>798</td>
<td>799</td>
<td>8771</td>
<td>2021-06-10</td>
<td>8:50:55</td>
<td>22:2:14</td>
<td>87552010075</td>
<td>38000894493</td>
<td>18831773874</td>
<td>39658641024</td>
</tr>
<tr>
<td>799</td>
<td>800</td>
<td>8771</td>
<td>2021-06-23</td>
<td>8:8:13</td>
<td>19:3:16</td>
<td>45356737851</td>
<td>90791781951</td>
<td>57742913264</td>
<td>74593282041</td>
</tr>
</tbody>
</table>
</div>
```python
rows.index
```
RangeIndex(start=0, stop=800, step=1)
```python
rows.columns
```
Index(['check_n', 'id', 'date', 'clock_in', 'clock_out', 'salary_1',
'salary_2', 'salary_3', 'salary_4'],
dtype='object')
```python
rows.to_numpy()
```
array([[1, '3686', datetime.date(2021, 6, 26), ..., '21040014451',
'64716969172', '33480505957'],
[2, '3686', datetime.date(2021, 6, 10), ..., '30750054304',
'94795101611', '26989956772'],
[3, '3686', datetime.date(2021, 6, 26), ..., '24360486310',
'38951440787', '50535635106'],
...,
[798, '8771', datetime.date(2021, 6, 13), ..., '92530661221',
'35926946256', '18157341512'],
[799, '8771', datetime.date(2021, 6, 10), ..., '38000894493',
'18831773874', '39658641024'],
[800, '8771', datetime.date(2021, 6, 23), ..., '90791781951',
'57742913264', '74593282041']], dtype=object)
```python
rows.to_numpy()[0]
```
array([1, '3686', datetime.date(2021, 6, 26), '7:22:30', '22:3:39',
'37189781436', '21040014451', '64716969172', '33480505957'],
dtype=object)
```python
rows.describe()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>check_n</th>
</tr>
</thead>
<tbody>
<tr>
<td>count</td>
<td>800.0000</td>
</tr>
<tr>
<td>mean</td>
<td>400.5000</td>
</tr>
<tr>
<td>std</td>
<td>231.0844</td>
</tr>
<tr>
<td>min</td>
<td>1.0000</td>
</tr>
<tr>
<td>25%</td>
<td>200.7500</td>
</tr>
<tr>
<td>50%</td>
<td>400.5000</td>
</tr>
<tr>
<td>75%</td>
<td>600.2500</td>
</tr>
<tr>
<td>max</td>
<td>800.0000</td>
</tr>
</tbody>
</table>
</div>
```python
rows["salary_1"].describe()
rows.salary_1.describe()
```
count 800
unique 800
top 93368241982
freq 1
Name: salary_1, dtype: object
```python
rows.describe(include='all')
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>check_n</th>
<th>id</th>
<th>date</th>
<th>clock_in</th>
<th>clock_out</th>
<th>salary_1</th>
<th>salary_2</th>
<th>salary_3</th>
<th>salary_4</th>
</tr>
</thead>
<tbody>
<tr>
<td>count</td>
<td>800.0000</td>
<td>800</td>
<td>800</td>
<td>800</td>
<td>800</td>
<td>800</td>
<td>800</td>
<td>800</td>
<td>800</td>
</tr>
<tr>
<td>unique</td>
<td>NaN</td>
<td>20</td>
<td>30</td>
<td>791</td>
<td>785</td>
<td>800</td>
<td>800</td>
<td>800</td>
<td>800</td>
</tr>
<tr>
<td>top</td>
<td>NaN</td>
<td>6479</td>
<td>2021-06-22</td>
<td>10:23:31</td>
<td>22:22:28</td>
<td>93368241982</td>
<td>69597211664</td>
<td>23466241348</td>
<td>71306233852</td>
</tr>
<tr>
<td>freq</td>
<td>NaN</td>
<td>40</td>
<td>37</td>
<td>2</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>mean</td>
<td>400.5000</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>std</td>
<td>231.0844</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>min</td>
<td>1.0000</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>25%</td>
<td>200.7500</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>50%</td>
<td>400.5000</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>75%</td>
<td>600.2500</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>max</td>
<td>800.0000</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
</tbody>
</table>
</div>
## 因為裡面只有check_n在設計時,是屬於數值資料,其他都是類別資料或者時間資料
```python
rows.info()
```
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 9 columns):
check_n 800 non-null int64
id 800 non-null object
date 800 non-null object
clock_in 800 non-null object
clock_out 800 non-null object
salary_1 800 non-null object
salary_2 800 non-null object
salary_3 800 non-null object
salary_4 800 non-null object
dtypes: int64(1), object(8)
memory usage: 56.4+ KB
```python
pd.to_numeric(rows.salary_1)
```
0 37189781436
1 9572746424
2 22131633201
3 57455281008
4 83254690126
...
795 63630615608
796 61733968697
797 11292160995
798 87552010075
799 45356737851
Name: salary_1, Length: 800, dtype: int64
```python
rows.salary_1
```
0 37189781436
1 9572746424
2 22131633201
3 57455281008
4 83254690126
...
795 63630615608
796 61733968697
797 11292160995
798 87552010075
799 45356737851
Name: salary_1, Length: 800, dtype: object
```python
rows1 = rows
rows1["salary_1_num"] = pd.to_numeric(rows.salary_1)
rows1.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>check_n</th>
<th>id</th>
<th>date</th>
<th>clock_in</th>
<th>clock_out</th>
<th>salary_1</th>
<th>salary_2</th>
<th>salary_3</th>
<th>salary_4</th>
<th>salary_1_num</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>3686</td>
<td>2021-06-26</td>
<td>7:22:30</td>
<td>22:3:39</td>
<td>37189781436</td>
<td>21040014451</td>
<td>64716969172</td>
<td>33480505957</td>
<td>37189781436</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>3686</td>
<td>2021-06-10</td>
<td>6:27:32</td>
<td>17:43:58</td>
<td>9572746424</td>
<td>30750054304</td>
<td>94795101611</td>
<td>26989956772</td>
<td>9572746424</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>3686</td>
<td>2021-06-26</td>
<td>10:23:31</td>
<td>19:54:49</td>
<td>22131633201</td>
<td>24360486310</td>
<td>38951440787</td>
<td>50535635106</td>
<td>22131633201</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>3686</td>
<td>2021-06-06</td>
<td>9:25:13</td>
<td>17:57:49</td>
<td>57455281008</td>
<td>13692157108</td>
<td>28280930472</td>
<td>16803796965</td>
<td>57455281008</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>3686</td>
<td>2021-06-20</td>
<td>10:17:18</td>
<td>22:21:48</td>
<td>83254690126</td>
<td>65883681703</td>
<td>3426866555</td>
<td>38369031374</td>
<td>83254690126</td>
</tr>
</tbody>
</table>
</div>
```python
rows1.info()
```
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 10 columns):
check_n 800 non-null int64
id 800 non-null object
date 800 non-null object
clock_in 800 non-null object
clock_out 800 non-null object
salary_1 800 non-null object
salary_2 800 non-null object
salary_3 800 non-null object
salary_4 800 non-null object
salary_1_num 800 non-null int64
dtypes: int64(2), object(8)
memory usage: 62.6+ KB
```python
rows.describe()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>check_n</th>
<th>salary_1_num</th>
</tr>
</thead>
<tbody>
<tr>
<td>count</td>
<td>800.0000</td>
<td>8.000000e+02</td>
</tr>
<tr>
<td>mean</td>
<td>400.5000</td>
<td>4.909562e+10</td>
</tr>
<tr>
<td>std</td>
<td>231.0844</td>
<td>2.908800e+10</td>
</tr>
<tr>
<td>min</td>
<td>1.0000</td>
<td>2.177264e+08</td>
</tr>
<tr>
<td>25%</td>
<td>200.7500</td>
<td>2.362071e+10</td>
</tr>
<tr>
<td>50%</td>
<td>400.5000</td>
<td>4.941501e+10</td>
</tr>
<tr>
<td>75%</td>
<td>600.2500</td>
<td>7.390700e+10</td>
</tr>
<tr>
<td>max</td>
<td>800.0000</td>
<td>9.989334e+10</td>
</tr>
</tbody>
</table>
</div>
```python
## 轉換時間資料
rows.clock_in[0]
```
'7:22:30'
```python
d = pd.Timestamp(rows.clock_in[0], unit='s')
d
```
Timestamp('2021-07-29 07:22:30')
```python
print(d.time())
```
07:22:30
```python
## 功課,修改格式
pd.Timestamp(rows.clock_in[10], unit='s')
rows.clock_in[10]
```
---------------------------------------------------------------------------
OutOfBoundsDatetime Traceback (most recent call last)
<ipython-input-20-16f85ae74d68> in <module>
1 ## 功課,修改格式
----> 2 pd.Timestamp(rows.clock_in[10], unit='s')
3 rows.clock_in[10]
pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()
pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()
pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_str_to_tsobject()
pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()
pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject()
pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds()
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 07:02:59
```python
rows.clock_in[10]
```
```python
## 字串切割
rows.clock_in.str.split(r"\:")
```
```python
## 列出一行,切割訊號
r = [rows.clock_in.str.split(r"\:")][0][10]
print(r)
for n in range(3):
if len(r[n]) <2:
r[n] = str(0)+r[n]
print(r)
':'.join(r)
```
```python
pd.Timestamp(':'.join(r))
```
```python
rows1.iloc[0]
```
```python
## 計算最大值
rows1.columns
```
```python
#5之後,9以前,不包含9
rows1.columns[5:9]
```
```python
## 取出欄位資料
rows1[rows1.columns[5:9]]
```
```python
## 刪除資料 欄位
rows1[rows1.columns[5:9]].drop("salary_1",axis=1)
```
```python
## 刪除資料 列位
rows1[rows1.columns[5:9]].drop(0,axis=0)
```
```python
## axis : 0 is by row, 1 by is column
rows1[rows1.columns[5:9]].max(axis=1)
```
```python
rows1.head()
```
```python
rows1.to_csv("new.csv", header=False,index = False)
```
```python
## 存入資料庫
conn = psycopg2.connect(database="mydb", user="test01",
password="testpw", host="localhost",
port="5432")
print(conn.closed)
## 允許python執行postgresql語法
with conn.cursor() as cur:
cur.execute("DROP TABLE IF EXISTS new_salary")
cur.execute("CREATE TABLE new_salary( \
check_n SERIAL PRIMARY KEY, \
id SERIAL, \
date DATE NOT NULL, \
clock_in VARCHAR (10) NOT NULL, \
clock_out VARCHAR (10) NOT NULL, \
salary_1 VARCHAR (11) NOT NULL, \
salary_2 VARCHAR (11) NOT NULL, \
salary_3 VARCHAR (11) NOT NULL, \
salary_4 VARCHAR (11) NOT NULL, \
salary_1_num BIGINT NOT NULL)")
f = open("./new.csv", 'r')
cur.copy_from(f, "new_salary", columns=rows1.columns, sep=",")
print("end")
```
```python
print(conn.closed)
## 允許python執行postgresql語法
with conn.cursor() as cur:
## 觀察查詢筆數
print(cur.rowcount)
## SQL語法
sql = "select * from new_salary"
## 執行sql語法
cur.execute(sql)
## 取得欄位名稱
name = [desc[0] for desc in cur.description]
## fetchall是將所有收尋的資料都寫入 rows 中
# rows = pd.DataFrame(cur.fetchall())
rows2 = pd.DataFrame(cur.fetchall(),columns=name)
## 取出一筆資料
# rows = pd.DataFrame(cur.fetchone())
## 自行設定要幾筆資料
# rows = pd.DataFrame(cur.fetchmany(10))
## 觀察查詢筆數
print(cur.rowcount)
print(rows2.shape)
rows2.head()
```
# 功課
## 一、輸出完整時間格式,移除不尋常的時間,取代表內格式
```python
```
## 二、將薪水1~4改成數值資料,並且計算出最大值,寫到新的欄位
```python
```
## 三、以上完成,將新表格存入psql資料庫
** 正確的datatypes設定
### 表格內容:check_n, id, date, clock_in, clock_out, maxsalary
#### 這部分沒有完成下禮拜作業會無法完成
```python
```