---
title: HW4-5
tags: dataframe
---
<style>
.green {
padding: 0 2px;
white-space: pre-line;
border-radius: 2px;
background-color: #CCFF99;}
.red {
padding: 0 2px;
white-space: pre-line;
border-radius: 2px;
background-color: #FFA488;}
.blue {
padding: 0 2px;
white-space: pre-line;
border-radius: 2px;
background-color: #77DDFF;}
.purple {
padding: 0 2px;
white-space: pre-line;
border-radius: 2px;
background-color: #D1BBFF;}
</style>
# `HW4-5重點小整理:`
### **計算「財務⾦融學系」106-108 年的女學⽣每年的平均⼈數**
```python
df_Finance=dataset[dataset.dept_name == '財務金融學系']
s_girl = df_Finance.girl.sum() # 女學生人數
mean_Finance = df_Finance.girl.sum()/3
mean_Round = round(mean_Finance,2)
print('106-108 年財務金融學系每年女學生平均人數=',mean_Round)
```
### **處理106-108 年的各科系男學⽣佔該系所有學⽣之比例,並由⼤到⼩排序**
```python
import pandas as pd
df=pd.read_csv('fcu.csv')
df['rate_boy']=df.boy/df.number
df = df.sort_values('rate_boy', ascending=False)
df['rate_boy']= round(df['rate_boy'],2)
```
### **處理106-108年,逢甲⼤學各科系的平均⼈數排名,請印出學⽣總⼈數前10名科系&平均⼈數。**
```python
n_dept = df.groupby('dept_name')['number'].mean()
n_dept = n_dept.sort_values(ascending=False).head(10)
for i in range(0, n_dept.size ):
print(i + 1, '. ', n_dept.index[i], ' : ', round(n_dept.values[i], 2))
```
### **讀取台灣證券交易所(TWSE)每⽇股價資料,列出(寫作業當⽇)的股王&股后(收盤價前兩名)的股票**
```python
link = 'https://www.twse.com.tw/exchangeReport/STOCK_DAY_ALL?response=open_data'
today_twii = pd.read_csv(link)
del_cols=['成交金額', '成交筆數']
df1 = today_twii.drop(columns=del_cols, axis=1)
new_cols={'證券代號': 'stock_id','證券名稱': 'stock_name','成交股數': 'volume','開盤價': 'open', '最高價': 'high','最低價': 'low','收盤價': 'close','漲跌價差':'delta'}
df1.rename(columns=new_cols, inplace=True)
df1.isnull().sum()
df1 = df1.dropna()
df_top2_vol = df1.sort_values('close', ascending=False).head(2)
print('股王:{} (NT${})'.format(df_top2_vol.iloc[0,1], df_top2_vol.iloc[0,6]))
print('股后:{} (NT${})'.format(df_top2_vol.iloc[1,1], df_top2_vol.iloc[1,6]))
```
### **讀取台灣證券交易所(TWSE)每⽇股價資料,篩選出(寫作業當⽇)「漲幅超過 3%」 且「 成交量超過 3000張」的股票**
```python
df1['quote_change']= df1.delta / (df1.close-df1.delta)
df_over_3pct= df1[(df1.quote_change >=0.03) & (df1.volume > 30000000)]
print(df_over_3pct.info)
```
老師解答
```python
import pandas as pd
#load dataset
link = 'http://www.twse.com.tw/exchangeReport/STOCK_DAY_ALL?response=open_data
today_twii = pd.read_csv(link)
#刪除不必要的欄位
del_cols=['成交金額,成交筆數']
dfl = today_twii.drop(columns=del_cols, axis=1)
#修改欄位名稱
new_cols=( "證券代號":"stock_id" ,"證券名稱":"stock_name","成交股數":"volume",
"開盤價":"open","最高價":"high","最低價":"low","收盤價":"close","漲跌價
差":"delta")
df1. rename(columns=new_cols, inplace=True)
#檢查 NAN
```
### **繪製採購數量&總⾦額前5名的國家之折線圖**
```python
import pandas as pd
df_retail = pd.read_csv('Online Retail.csv')
df_retail.isnull().sum()
df1 = df_retail.dropna()
x=df1.duplicated().sum()
print('這個資料集有 {0} 重複的紀錄'.format(x))
df2=df1.drop_duplicates()
df2 = df2[df2.Quantity>0]
df2.info()
df2['InvoiceDate'] = pd.to_datetime(df2['InvoiceDate'])
df2.info()
df2['InvoiceYearMonth'] = df2['InvoiceDate'].dt.strftime('%Y/%m')
df2.info()
df2['Revenue']=df2['UnitPrice'] * df2['Quantity']
df_qr = df2.groupby('Country')[['Quantity', 'Revenue']].sum().reset_index()
df_qr1 = df_qr.sort_values('Revenue', ascending=False).head(5)
df_qr1.plot(x='Country', title='Quantity and Revenue of Each Country')
```