【RFM分析_eCommerce Events History in Cosmetics Shop】ft. Kaggle - Python

# **【RFM分析_eCommerce Events History in Cosmetics Shop】ft. Kaggle - Python** ```= - RFM 定義 - RFM Model: 中位數區分 - RFM Model: 四分位數&手動區分 ``` ### RFM 定義 [參考](https://www.hububble.co/blog/rfm-recency-frequency-monetary) [參考2](https://www.woshipm.com/data-analysis/4194147.html) ![截圖 2025-06-04 21.32.05](https://hackmd.io/_uploads/r1llopaMll.png) ```= R (Recency) 最近一次購買距今多久越新越好（R=1） F (Frequency) 購買頻率（次數）越頻繁越好（F=1） M (Monetary) 消費金額總和越高越好（M=1） ======================= R1, F1, M1, "VVIP (VIP客戶)" R1, F1, M0, "Loyal Customer (潛力忠誠客戶)" R1, F0, M1, "New Potential VIP (重要發展客戶)" R1, F0, M0, "New Customer (新客戶)" R0, F1, M1, "VIP with No Recent Purchase (重要喚回客戶)" R0, F1, M0, "Frequent Customer (一般維持客戶，休眠常客)" R0, F0, M1, "Old Potential VIP (流失潛力客戶)" R0, F0, M0, "Lost Customer (流失客戶)" ``` <br/> ### RFM Model: 中位數區分 [資料來源 : eCommerce Events History in Cosmetics Shop ](https://www.kaggle.com/datasets/mkechinov/ecommerce-events-history-in-cosmetics-shop) ![截圖 2025-06-15 14.47.54](https://hackmd.io/_uploads/Hkkh31nmgx.png) ```= import pandas as pd import numpy as np import seaborn as sb import kagglehub ``` ``` path = kagglehub.dataset_download("mkechinov/ecommerce-events-history-in-cosmetics-shop") print("Path to dataset files:", path) ``` 這裡只抽取200萬筆來做 ![截圖 2025-06-15 14.54.44](https://hackmd.io/_uploads/BJG8AJ37el.png) ![截圖 2025-06-15 15.41.23](https://hackmd.io/_uploads/ryqNFenmgx.png) ```= import plotly.express as px df['event_time'] = pd.to_datetime(df['event_time'], errors='coerce') df['event_year_month'] = df['event_time'].dt.strftime('%Y_%m') event_type_count = df.groupby(["event_year_month", "event_type"]).count()["event_time"] # 在每個分組中，計算 event_time 欄位的非空值數量 fig = px.bar(event_type_count.reset_index(), x="event_year_month", y="event_time", color="event_type", title="Events by Month") fig.show() ``` ![截圖 2025-06-15 15.42.01](https://hackmd.io/_uploads/SkYLKghQge.png) 計算RFM ```= purchase_df = df_2[df_2['event_type'] == 'purchase'].copy() today = df['event_time'].max() + pd.Timedelta(days=1) # 模擬觀察期結束點 rfm = purchase_df.groupby('user_id').agg( recency=('event_time', lambda x: (today - x.max()).days), frequency=('user_session', pd.Series.nunique), monetary=('price', 'sum') ).reset_index() ``` 用中位數當閥值 ```= rfm['R'] = (rfm['recency'] <= rfm['recency'].median()).astype(int) rfm['F'] = (rfm['frequency'] >= rfm['frequency'].median()).astype(int) rfm['M'] = (rfm['monetary'] >= rfm['monetary'].median()).astype(int) # 客戶標籤 rfm_segment_map = { (1, 1, 1): "VVIP (VIP客戶)", (1, 1, 0): "Loyal Customer (潛力忠誠客戶)", (1, 0, 1): "New Potential VIP (重要發展客戶)", (1, 0, 0): "New Customer (新客戶)", (0, 1, 1): "VIP with No Recent Purchase (重要喚回客戶)", (0, 1, 0): "Frequent Customer (一般維持客戶，休眠常客)", (0, 0, 1): "Old Potential VIP (流失潛力客戶)", (0, 0, 0): "Lost Customer (流失客戶)" } ``` ```= # 分群標籤 rfm['segment'] = rfm[['R', 'F', 'M']].apply(lambda x: rfm_segment_map[tuple(x)], axis=1) # rfm_result = rfm[['user_id', 'recency', 'frequency', 'monetary', 'R', 'F', 'M', 'segment']] rfm_result.head() ``` ![截圖 2025-06-15 17.27.24](https://hackmd.io/_uploads/rJxEMfhQel.png) ```= rfm_result['segment'].value_counts(dropna=False) ``` ![截圖 2025-06-15 17.27.40](https://hackmd.io/_uploads/HJ9EzzhXel.png) <br/> ### RFM Model: 四分位數&手動區分假設想分更細，改成四分位數&手動分類 ```= conditions = [ # VVIP (VIP客戶) (rfm['R_score'].isin([3,4])) & (rfm['F_score'].isin([3,4])) & (rfm['M_score'].isin([3,4])), # Loyal Customer (潛力忠誠客戶) # 最近購買且頻率高，但金額還沒上來 (rfm['R_score'].isin([3,4])) & (rfm['F_score'].isin([3,4])) & (rfm['M_score'].isin([1,2])), # New Potential VIP (重要發展客戶) # 最近購買且金額高，但頻率還沒上來，需要深耕 (rfm['R_score'].isin([3,4])) & (rfm['F_score'].isin([1,2])) & (rfm['M_score'].isin([3,4])), # New Customer (新客戶) # 最近購買，但頻率和金額都不高 (rfm['R_score'].isin([3,4])) & (rfm['F_score'].isin([1, 2])) & (rfm['M_score'].isin([1, 2])), # VIP with No Recent Purchase (重要喚回客戶) # 以前是高頻高價，但現在活躍度下降了 (rfm['R_score'].isin([1, 2])) & (rfm['F_score'].isin([3,4])) & (rfm['M_score'].isin([3,4])), # 休眠常客 (Dormant Regular) # 以前很頻繁但現在不活躍，且消費中低 (rfm['R_score'].isin([1, 2])) & (rfm['F_score'].isin([3,4])) & (rfm['M_score'].isin([1, 2])), # 流失潛力客戶 (Lost Potential) # 很久沒買且頻率不高，但過去消費金額高，有挽回潛力 (rfm['R_score'].isin([1, 2])) & (rfm['F_score'].isin([1, 2])) & (rfm['M_score'].isin([3,4])), # 流失客戶 # 很久沒買，頻率低，金額也低 (rfm['R_score'].isin([1, 2])) & (rfm['F_score'].isin([1, 2])) & (rfm['M_score'].isin([1, 2])) ] choices = [ "VVIP (VIP客戶)", "Loyal Customer (潛力忠誠客戶)", "New Potential VIP (重要發展客戶)", "New Customer (新客戶)", "VIP with No Recent Purchase (重要喚回客戶)", "Frequent Customer (一般維持客戶，休眠常客)", "Old Potential VIP (流失潛力客戶)", "Lost Customer (流失客戶)" ] rfm['Segment_2'] = np.select(conditions, choices, default="待分類客戶") # rfm_result = rfm[['user_id', 'recency', 'frequency', 'monetary', 'R', 'F', 'M', 'R_score', 'F_score', 'M_score', 'segment', 'Segment_2']] rfm_result.head() ``` ![截圖 2025-06-15 18.00.03](https://hackmd.io/_uploads/SJT3FM3Xll.png) ```= rfm_result['Segment_2'].value_counts(dropna=False) ``` ![截圖 2025-06-15 18.01.01](https://hackmd.io/_uploads/ryxx9G3Qll.png)