輕鬆學習 Python |從基礎到應用,成為初級資料分析師
# 股市預測實作
環境: Google Colab
[Link](https://colab.research.google.com/drive/10QTeeOnISYaVDYdhOrFQwuw_gHcKmNn3#scrollTo=f6Teq125GChO)
## 步驟:
### 1. 環境設定
```
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from datetime import datetime, timedelta
from google.colab import drive
import os
# 設定繪圖風格
plt.style.use('seaborn-v0_8')
# 嘗試設定中文字型 (Colab 預設可能不支援中文顯示,若亂碼可忽略或需額外安裝字型)
plt.rcParams['font.sans-serif'] = ['sans-serif']
plt.rcParams['axes.unicode_minus'] = False
# 掛載 Google Drive
if not os.path.exists('/content/drive'):
print("正在掛載 Google Drive...")
drive.mount('/content/drive')
else:
print("Google Drive 已掛載。")
# 設定存檔路徑
folder_name = 'ColabNotebooks'
save_dir = f'/content/drive/MyDrive/{folder_name}'
# 如果資料夾不存在,自動建立 (選用,或存至根目錄)
if not os.path.exists(save_dir):
print(f"路徑 {save_dir} 不存在,將儲存於 MyDrive 根目錄。")
save_dir = '/content/drive/MyDrive'
```
### 2. 互動式輸入股市代號
```
# ==========================================
# 互動式輸入股市代號
# ==========================================
print("\n" + "="*40)
user_input = input("請輸入台北股市代碼 (例如 2330, 2317, 0050): ").strip()
print("="*40 + "\n")
# 自動判斷並加上後綴
if user_input.isdigit():
# 預設為上市股票 (.TW),若是上櫃可手動輸入如 8069.TWO
ticker = f"{user_input}.TW"
else:
ticker = user_input.upper()
print(f"正在準備下載目標:{ticker} ...")
```
### 3. 網路爬蟲與資料處理
```
# ==========================================
# 網路爬蟲與資料處理
# ==========================================
try:
end_date = datetime.now()
start_date = end_date - timedelta(days=5*365)
df = yf.download(ticker, start=start_date, end=end_date)
if df.empty:
raise ValueError("下載數據為空,請檢查股票代碼是否正確。")
# 資料清理
if 'Adj Close' in df.columns:
df = df[['Adj Close']].copy()
df.columns = ['Price']
elif 'Close' in df.columns: # Fallback to 'Close' if 'Adj Close' is not available
df = df[['Close']].copy()
df.columns = ['Price']
else:
raise ValueError("下載數據中找不到 'Adj Close' 或 'Close' 價格欄位。")
df = df.dropna()
print(f"✅ 成功獲取 {ticker} 資料,共 {len(df)} 筆交易日數據。")
except Exception as e:
print(f"❌ 下載失敗: {e}")
# 停止執行後續程式
exit()
```
### 4.模型預測
```
# ==========================================
# 模型運算 (趨勢 + 蒙地卡羅)
# ==========================================
# 計算統計參數
log_ret = np.log(df['Price'] / df['Price'].shift(1)).dropna() #計算每日對數報酬率(Log Returns)
mu = log_ret.mean() * 252 #年化平均報酬率μ
sigma = log_ret.std() * np.sqrt(252) #年化波動度σ
last_price = df['Price'].iloc[-1] #記錄最後一天的價格(模擬起點)
# 用線性回歸建立「趨勢線」模型
df['Date_Ordinal'] = df.index.map(pd.Timestamp.toordinal) #轉換日期成數字(ordinal)
X = df[['Date_Ordinal']]
y = df['Price']
reg = LinearRegression().fit(X, y) #訓練線性回歸模型(機器學習趨勢線)
# 蒙地卡羅模擬設定 (預測至 2026 年底)
target_date = datetime(2026, 12, 31)
days_to_predict = (target_date - df.index[-1]).days
num_simulations = 1000
dt = 1/252
# 使用幾何布朗運動(GBM)進行蒙地卡羅模擬(Monte Carlo Simulation),產生大量未來股價路徑(price paths)
np.random.seed(42)
random_shocks = np.random.normal(0, 1, (days_to_predict, num_simulations))
price_paths = np.zeros_like(random_shocks)
price_paths[0] = last_price
drift = (mu - 0.5 * sigma**2) * dt
shock = sigma * np.sqrt(dt)
for t in range(1, days_to_predict):
price_paths[t] = price_paths[t-1] * np.exp(drift + shock * random_shocks[t])
# 統計結果
future_dates = [df.index[-1] + timedelta(days=x) for x in range(1, days_to_predict + 1)]
sim_mean = np.mean(price_paths, axis=1)
sim_upper = np.percentile(price_paths, 95, axis=1)
sim_lower = np.percentile(price_paths, 5, axis=1)
```
### 5. 視覺化繪圖
```
# ==========================================
# 視覺化繪圖
# ==========================================
plt.figure(figsize=(12, 7))
plt.plot(df.index, df['Price'], label='History', color='black')
plt.plot(future_dates, sim_mean, label='Forecast Mean (GBM)', color='red')
plt.fill_between(future_dates, sim_lower, sim_upper, color='red', alpha=0.1, label='90% Confidence Interval')
plt.axvline(x=datetime(2026, 1, 1), color='green', linestyle=':', label='Start of 2026')
plt.title(f'{ticker} Stock Price Prediction (2025-2026)', fontsize=16)
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.grid(True, alpha=0.3)
# 儲存圖片到 Drive
img_filename = f'{ticker.replace(".TW", "")}_Prediction_2026.png'
img_path = os.path.join(save_dir, img_filename)
plt.savefig(img_path)
print(f"✅ 預測圖表已儲存至: {img_path}")
plt.show()
# ==========================================
# 儲存預測數據到 Google Drive
# ==========================================
# 整理預測數據 (只取平均預測路徑)
forecast_df = pd.DataFrame({
'Date': future_dates,
'Predicted_Price_Mean': sim_mean,
'Lower_Bound_5%': sim_lower,
'Upper_Bound_95%': sim_upper
})
forecast_df.set_index('Date', inplace=True)
# 合併歷史與預測資料 (選用)
csv_filename = f'{ticker.replace(".TW", "")}_Analysis_Data.csv'
csv_path = os.path.join(save_dir, csv_filename)
try:
# 將歷史數據與預測數據分開存,或者您希望存成兩個 sheet 也可以,這裡存成簡單 CSV
# 這裡示範儲存預測結果
forecast_df.to_csv(csv_path)
print(f"✅ 詳細預測數據已儲存至: {csv_path}")
print(f"\n2026 年底預測價格 (Mean): {sim_mean[-1]:.2f}")
except Exception as e:
print(f"❌ 存檔失敗: {e}")
```
