# 營收與使用者行為資料分析專案 ###### tags: `Python/SQL商業資料分析` , `Python` ,`專案` , `資料分析` [TOC] # (I)營收與使用者行為資料分析專案 ## 範例環境建置 先將以下 SQL 內容複製貼上到 phpMyAdmin SQL 輸入框執行,即可建立學習使用的資料表和資料集。 預計新增資料表: - users:使用者資訊 - products:商品資訊 - orders:訂單資訊 - order_details:訂單詳細資訊 ```text /* PRIMARY KEY 加入 AUTO_INCREMENT 可以在新增時自動累加 id 數值,就不用手動輸入 */ CREATE table users ( id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(30) NOT NULL, email VARCHAR(120) NOT NULL, age INT NOT NULL, created_at DATETIME ); /* INSERT INTO 語法加入指定要插入的屬性欄位,就可以明確指出要插入的欄位,這樣就可以忽略自動產生或有預設值的欄位 */ INSERT INTO users (name, email, age, created_at) VALUES ('Jack Hung', 'jackh32@gmail.com', 20, '2019-07-24 17:11:01'), ('Tony Liu', 'tonykk@gmail.com', 62, '2020-06-03 17:11:01'), ('Amy Chang', 'amychang@gmail.com', 32, '2020-05-11 17:11:01'); CREATE table products ( id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(30) NOT NULL, price INT NOT NULL, weight DECIMAL(5, 2), category VARCHAR(20), created_at DATETIME ); /* INSERT INTO 資料表 (欄位) VALUES` 後面可以透過 , 分隔多筆資料,就可以一次新增多筆資料 */ INSERT INTO products (name, price, weight, category, created_at) VALUES ('華速 intel i3 筆電', 20000, 2.12, 'NB', '2020-05-11 17:11:01'), ('Mac Pro 筆電', 62000, 1.4, 'NB', '2020-05-11 17:11:01'), ('微興電競筆電', 32000, 3.00, 'NB', '2020-05-11 17:11:01'), ('戈林冰箱', 22000, 13.78, '3C', '2020-05-11 17:11:01'), ('三力冰箱', 52000, 23.18, '3C', '2020-05-11 17:11:01'), ('C 語言入門', 420, 0.31, 'Book', '2020-05-11 17:11:01'), ('python3 實戰', 580, 0.28, 'Book', '2020-05-11 17:11:01'), ('JavaScript 英雄', 1000, 0.12, 'Book', '2020-05-11 17:11:01'), ('Java 資料分析', 340, -1, 'Book', '2020-05-11 17:11:01'), ('python 資料分析', 640, -0.43, 'Book', '2020-05-11 17:11:01'); CREATE table orders ( id INT PRIMARY KEY AUTO_INCREMENT, amount INT NOT NULL, -- 先宣告欄位,再設定外鍵 FOREIGN KEY REFERENCES customer_id INT, created_at DATETIME, FOREIGN KEY (customer_id) REFERENCES users(id) ); INSERT INTO orders (amount, customer_id, created_at) VALUES (188420, 1, '2020-05-11 17:11:01'), (54000, 3, '2019-04-13 17:11:01'), (104420, 2, '2020-06-21 17:11:01'), (52420, 2, '2020-05-01 17:11:01'), (104000, 2, '2020-07-11 17:11:01'), (32420, 2, '2019-03-24 17:11:01'); CREATE table order_details ( id INT PRIMARY KEY AUTO_INCREMENT, -- 先宣告欄位,再設定外鍵 FOREIGN KEY REFERENCES order_id INT, product_id INT, created_at DATETIME, FOREIGN KEY (order_id) REFERENCES orders(id), FOREIGN KEY (product_id) REFERENCES products(id) ); INSERT INTO order_details (order_id, product_id, created_at) VALUES (1, 1, '2020-05-11 17:11:01'), (1, 2, '2020-05-11 17:11:01'), (1, 3, '2020-05-11 17:11:01'), (1, 4, '2020-05-11 17:11:01'), (1, 5, '2020-05-11 17:11:01'), (1, 6, '2020-05-11 17:11:01'), (2, 3, '2020-04-13 17:11:01'), (2, 4, '2020-04-13 17:11:01'), (3, 5, '2020-06-21 17:11:01'), (3, 6, '2020-06-21 17:11:01'), (3, 5, '2020-06-21 17:11:01'), (4, 5, '2020-05-01 17:11:01'), (4, 6, '2020-05-01 17:11:01'), (5, 5, '2020-07-11 17:11:01'), (5, 5, '2020-07-11 17:11:01'), (6, 6, '2020-03-24 17:11:01'), (6, 3, '2020-03-24 17:11:01'); ``` ## 整合查詢資料 ```text SELECT * FROM orders INNER JOIN order_details ON order_details.order_id = orders.id INNER JOIN products ON order_details.product_id = products.id; ``` ![](https://i.imgur.com/Bvnl6bd.png) ## 視窗函式(windows function) > 視窗函式是一個用來**細部分析使用的函式**,要特別注意不要和 View 檢視表搞混,檢視表是用來儲存常用的 SQL 敘述方便重複使用。而 視窗函式則是透過將資料進行更細緻的細分成一個個**窗格**方便更細部的計算。 ```text SELECT category, SUM(price) FROM products GROUP BY category; ``` ![](https://i.imgur.com/vNARCks.png) 由於 GROUP BY 無法進行太細部的運算,例如分類中各自計算或是排名,所以 視窗函式(windows function)就是幫我們解決這個問題。以下為視窗函式語法: - 當作視窗函式使用的彙總函式(SUM、AVG 等) - 特殊視窗函式專用函式:RANK() 排名函式 ## 使用 PARTITION 進行類別各別內部排名 ```text SELECT name, price, category, RANK() OVER(PARTITION BY category ORDER BY price) FROM products; ``` ![](https://i.imgur.com/BJgR2sq.png) ## 不使用 PARTITION 進行總排名 ```text SELECT name, price, category, RANK() OVER(ORDER BY price) FROM products; ``` ![](https://i.imgur.com/PsoSN7O.png) ## 以 category 當作窗格進行累加價格 ```text SELECT name, price, category, SUM(price) OVER(PARTITION BY category ORDER BY price) AS aggregate_price FROM products; ``` ![](https://i.imgur.com/gZ7yvMv.png) **名詞定義** - CURRENT ROW:目前基準列資料 - PRECEDING:基準列前面 - FOLLOWING:基準列後面 ### 移動平均範例 I 透過視窗函數建立移動窗格 Frame 進行每一筆資料和前兩(PRECEDING)筆進行移動平均(使用 id 進行排序)。 ```text SELECT id, name, price, category, AVG(price) OVER (ORDER BY id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_price FROM products; -- 簡化寫法2 SELECT id, name, price, category, AVG(price) OVER (ORDER BY id ROWS 2 PRECEDING) AS moving_avg_price FROM products; ``` ![](https://i.imgur.com/L4v8I7U.png) ### 移動平均範例 II 包含本身和後一筆取移動平均: ```text SELECT id, name, price, category, AVG(price) OVER (ORDER BY id ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) AS moving_avg_price FROM products; ``` ![](https://i.imgur.com/tfdeoFn.png) ### 移動平均範例 III 將目前這筆和資料前面一筆以及後面一筆當作窗格進行移動平均。 ```text SELECT id, name, price, category, AVG(price) OVER (ORDER BY id ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS moving_avg_price FROM products; ``` ![](https://i.imgur.com/4rlD9I7.png) ## 小計、總計 GROUP 運算子 `ROLLUP` 是一個特別語法,可以讓我們根據條件進行分組小計和總計的計算。 ```text SELECT name, category, SUM(price) FROM products GROUP BY category, name WITH ROLLUP; ``` ## 常用營收分析日期函式 - `DATE()` 取出日期時間中日期的部分:YYYY-MM-DD - `EXTRACT()` 日期函式可以取出 TIMESTAMP 的日期部分(年 YEAR、月 MONTH等) ```text SELECT DATE('2020-03-24 17:11:01'); ``` 執行結果: ```text 2020-03-24 ``` ## 日營收分析 使用 `DATE` 函式可以將時間轉為日期,再透過 `GROUP BY DATE(created_at)` 將訂單進行以日期進行分組,以取得日營收分析。 ```text SELECT DATE(created_at), COUNT(*), SUM(amount), AVG(amount) FROM orders GROUP BY DATE(created_at) ORDER BY DATE(created_at); ``` ![](https://i.imgur.com/r7cctNb.png) ## 月營收平均 使用 `DATE` 函式可以將時間轉為日期,`EXTRACT` 可以抽取出年或月,再透過 `EXTRACT(MONTH FROM DATE(created_at))` 將訂單進行以月進行分組,以取得月營收分析。 ```text SELECT EXTRACT(YEAR_MONTH FROM DATE(created_at)), COUNT(*), SUM(amount), AVG(amount) FROM orders GROUP BY EXTRACT(MONTH FROM DATE(created_at)) ORDER BY DATE(created_at); ``` ![](https://i.imgur.com/1VwNgY5.png) ## 年營收平均 使用 `DATE` 函式可以將時間轉為日期,再透過 `EXTRACT(YEAR FROM DATE(created_at))` 將訂單進行以年進行分組,以取得年營收分析。 ```text SELECT EXTRACT(YEAR FROM DATE(created_at)), COUNT(*), SUM(amount), AVG(amount) FROM orders GROUP BY EXTRACT(YEAR FROM DATE(created_at)) ORDER BY DATE(created_at); ``` ![](https://i.imgur.com/ToUXYPK.png) ## 類別營收 若是希望統計各別類別的營收狀況,可以使用 JOIN 搭配 GROUP BY 方式: ```text SELECT SUM(products.price) FROM orders INNER JOIN order_details ON order_details.order_id = orders.id INNER JOIN products ON order_details.product_id = products.id GROUP BY products.category; ``` # (II)營收與使用者行為資料分析專案 ## 範例環境資料建立 新增一個資料庫 `demo_shop_logs`,專門存放使用者瀏覽我們電子商務網頁的使用行為日誌 event log 相關資訊: ![](https://i.imgur.com/UtC7IrN.png) 同時創建資料表 `user_action_logs` 存放使用者的使用行為日誌: - `user_id`:使用者 id,若沒有登入則為 NULL - `action_type`:使用者做哪些行為所觸發的事件紀錄,有單純瀏覽 VIEW、購買 PURCHASE、搜尋 SEARCH 和 ADD_CART 加入購物車等幾種事件行為 - `view_page`:使用者瀏覽的網頁網址 - `purchase_amount`:消費者購買的商品金額,若無購買商品則為 NULL - `create_at`:事件發生時間 ```text CREATE table user_action_logs ( id INT PRIMARY KEY AUTO_INCREMENT, user_id VARCHAR(30), action_type VARCHAR(60) NOT NULL, view_page VARCHAR(500), purchase_amount INT, create_at DATETIME ); INSERT INTO user_action_logs (user_id, action_type, view_page, purchase_amount, create_at) VALUES (1, 'VIEW', 'https://www.demo-shop/', NULL, '2020-06-27 17:11:01'), (1, 'ADD_CART', 'https://www.demo-shop/products/1343', NULL, '2020-07-12 17:12:01'), (1, 'PURCHASE', 'https://www.demo-shop/products/1343', 7200, '2020-07-12 17:13:01'), (1, 'SEARCH', 'https://www.demo-shop/search', NULL, '2020-07-13 17:11:01'), (2, 'ADD_CART', 'https://www.demo-shop/search', NULL, '2020-07-14 17:11:01'), (2, 'PURCHASE', 'https://www.demo-shop/products/21343', 1000, '2020-07-14 17:13:01'), (4, 'SEARCH', 'https://www.demo-shop/search', NULL, '2020-07-15 17:11:01'), (1, 'ADD_CART', 'https://www.demo-shop/products/71343', NULL, '2020-07-15 17:13:01'), (6, 'ADD_CART', 'https://www.demo-shop/search', NULL, '2020-07-16 17:11:01'), (6, 'PURCHASE', 'https://www.demo-shop/products/711343', 1300, '2020-07-16 17:13:01'), (NULL, 'SEARCH', 'https://www.demo-shop/search', NULL, '2020-07-16 17:11:01'), (5, 'ADD_CART', 'https://www.demo-shop/products/1343', NULL, '2020-07-17 17:13:01'), (5, 'PURCHASE', 'https://www.demo-shop/products/11343', 4300, '2020-07-17 17:13:01'), (3, 'ADD_CART', 'https://www.demo-shop/products/53343', NULL, '2020-07-18 17:13:01'), (5, 'VIEW', 'https://www.demo-shop/products/2343', NULL, '2020-07-18 17:13:01'), (3, 'ADD_CART', 'https://www.demo-shop/products/1343', NULL, '2020-07-18 17:13:01'), (5, 'VIEW', 'https://www.demo-shop/products/133', NULL, '2020-07-18 17:13:01'), (5, 'ADD_CART', 'https://www.demo-shop/products/53343', NULL, '2020-07-20 17:13:01'), (5, 'PURCHASE', 'https://www.demo-shop/products/2343', 4300, '2020-07-20 17:15:01'), (3, 'ADD_CART', 'https://www.demo-shop/products/1343', NULL, '2020-07-20 17:13:01'), (5, 'VIEW', 'https://www.demo-shop/products/133', NULL, '2020-07-20 17:13:01'), (NULL, 'ADD_CART', 'https://www.demo-shop/products/53343', NULL, '2020-07-24 17:13:01'), (NULL, 'PURCHASE', 'https://www.demo-shop/products/53343', 4300, '2020-07-24 17:13:01'), (NULL, 'VIEW', 'https://www.demo-shop/products/1343', NULL, '2020-07-24 17:13:01'), (NULL, 'VIEW', 'https://www.demo-shop/products/133', NULL, '2020-07-24 17:13:01'), (NULL, 'ADD_CART', 'https://www.demo-shop/products/53343', NULL, '2020-07-25 17:13:01'), (5, 'PURCHASE', 'https://www.demo-shop/products/53343', 4300, '2020-07-25 17:13:01'), (NULL, 'ADD_CART', 'https://www.demo-shop/products/1343', NULL, '2020-07-25 17:13:01'), (NULL, 'VIEW', 'https://www.demo-shop/products/133', NULL, '2020-07-25 17:13:01'), (NULL, 'PURCHASE', 'https://www.demo-shop/products/1343', 5300, '2020-07-26 17:13:01'), (5, 'ADD_CART', 'https://www.demo-shop/products/2343', NULL, '2020-07-26 17:13:01'), (NULL, 'VIEW', 'https://www.demo-shop/products/1343', NULL, '2020-07-26 17:13:01'), (NULL, 'VIEW', 'https://www.demo-shop/products/133', NULL, '2020-07-26 17:13:01'), (5, 'ADD_CART', 'https://www.demo-shop/products/2343', NULL, '2020-07-26 17:13:01'), (5, 'PURCHASE', 'https://www.demo-shop/products/2343', 4300, '2020-07-27 17:13:01'), (NULL, 'ADD_CART', 'https://www.demo-shop/products/1343', NULL, '2020-07-27 17:13:01'), (NULL, 'VIEW', 'https://www.demo-shop/products/2343', NULL, '2020-07-27 17:13:01'), (NULL, 'ADD_CART', 'https://www.demo-shop/products/1343', NULL, '2020-07-27 17:13:01'), (NULL, 'VIEW', 'https://www.demo-shop/products/8343', NULL, '2020-07-28 17:13:01'), (NULL, 'VIEW', 'https://www.demo-shop/products/1343', NULL, '2020-07-28 17:13:01'), (NULL, 'VIEW', 'https://www.demo-shop/products/1343', NULL, '2020-07-28 17:13:01'), (NULL, 'VIEW', 'https://www.demo-shop/products/133', NULL, '2020-07-28 17:13:01'), (NULL, 'ADD_CART', 'https://www.demo-shop/products/53343', NULL, '2020-07-28 17:13:01'), (5, 'PURCHASE', 'https://www.demo-shop/products/53343', 4300, '2020-07-29 17:13:01'), (2, 'ADD_CART', 'https://www.demo-shop/products/14343', NULL, '2020-07-29 17:13:01'), (2, 'PURCHASE', 'https://www.demo-shop/products/14343', 5300, '2020-07-29 17:13:01'), (1, 'ADD_CART', 'https://www.demo-shop/products/1343', NULL, '2020-07-30 17:13:01'), (NULL, 'VIEW', 'https://www.demo-shop/products/8343', NULL, '2020-07-30 17:13:01'), (3, 'VIEW', 'https://www.demo-shop/products/1343', NULL, '2020-07-30 17:13:01'), (NULL, 'VIEW', 'https://www.demo-shop/products/133', NULL, '2020-07-30 17:13:01'); ``` ## 使用者行為分析 ### 使用者行為分布 從使用者行為日誌中有許多值得挖掘的洞見,可以先`全面性理解資料`,再根據商業業務目標進一步分析。首先,先粗略全面了解一下整體每一種事件行為的總數: ```text SELECT action_type, COUNT(*) FROM user_action_logs GROUP BY action_type; ``` ```text /*同時我們也可以試著了解網頁被瀏覽的情況*/ SELECT view_page, COUNT(*) FROM user_action_logs GROUP BY view_page; ``` 可以看到大概使用者行為的分布情形(加入購物車多,可能代表有興趣購買的多): ![](https://i.imgur.com/0sBKVJF.png) ### 統計所有使用者行為 統計每個使用者和沒有登入的使用者的使用行為: ```text SELECT user_id, SUM(CASE WHEN action_type = 'VIEW' THEN 1 ELSE 0 END) AS view_count, SUM(CASE WHEN action_type = 'PURCHASE' THEN 1 ELSE 0 END) AS purchase_count, SUM(CASE WHEN action_type = 'SEARCH' THEN 1 ELSE 0 END) AS search_count, SUM(CASE WHEN action_type = 'ADD_CART' THEN 1 ELSE 0 END) AS add_cart_count FROM user_action_logs GROUP BY user_id; ``` ![](https://i.imgur.com/CQoDQiL.png) ## RFM 顧客分析 I RFM(Recency、Frequency 和 Monetary 三個字開頭縮寫)顧客分析是一種透過顧客消費行為進行顧客分類的一種分析方式,透過細部的分析了解哪些顧客是所謂的 VIP 顧客,哪些是最近有需求的顧客,進而規劃細部的行銷方案。RFM 顧客分析法除了可以在網路電子商務零售通路使用也可以用於一般的數位產品服務顧客分析上。 ![](https://i.imgur.com/Mdqv3lz.png) - **Recency 最近購買日期**:消費者上次消費購買或使用服務時間離目前時間越近分數越高,代表顧客互動的新鮮度 - 上次電子商務消費的日期 - 上次使用者打開 App 的時間 - 上次打開信件的時間 - **Frequency 累計購買次數**:消費者累計購買商品次數愈多或使用服務頻率越高分數越高,代表顧客購買頻率 - 一段時間購買商品的次數 - 一段時間使用 App 的次數 - 一段時間內開啟信件的次數 - **Monetary 累計合計購買金額**:消費者累計購買商品金額或使用服務時間越長愈高分數越高,代表顧客貢獻的營收能力 - 一段時間內總消費金額 - 一段時間內使用 App 的時間 - 一段時間內閱讀電子報的時間 ```text SELECT user_id, /* 最近一次消費時間 */ MAX(create_at) AS recency_dt, /* 最近購買日期和目前相差天數 */ DATEDIFF('2020-07-30', DATE(MAX(create_at))) AS recency, /* 統計消費總次數 */ SUM(CASE WHEN action_type = 'PURCHASE' THEN 1 ELSE 0 END) AS frequency, /* 統計消費總金額 */ SUM(purchase_amount) AS monetary FROM user_action_logs GROUP BY user_id; ``` ![](https://i.imgur.com/HC5QJKA.png) 為了接下來使用方便,我們將 RFM 統計資料建立成一個 View 檢視表: ```text CREATE VIEW RFM AS SELECT user_id, /* 最近一次消費時間 */ MAX(create_at) AS recency_dt, /* 最近購買日期和目前相差天數 */ DATEDIFF('2020-07-30', DATE(MAX(create_at))) AS recency, /* 統計消費總次數 */ SUM(CASE WHEN action_type = 'PURCHASE' THEN 1 ELSE 0 END) AS frequency, /* 統計消費總金額 */ SUM(purchase_amount) AS monetary FROM user_action_logs GROUP BY user_id ``` ## RFM 顧客分析 II 接著進行 RFM 分析排名(排名方式沒有絕對的標準,可以自己根據業務商品總類和屬性和店家的狀況定義 RFM 分類方式)。這邊初步定義網站分析方式如下: ![](https://i.imgur.com/zQxWB1G.png) ```text SELECT user_id, recency, monetary, frequency, CASE WHEN recency < 7 THEN 5 WHEN recency < 14 THEN 4 WHEN recency < 30 THEN 3 WHEN recency < 60 THEN 2 ELSE 1 END AS recency_point, CASE WHEN frequency > 10 THEN 5 WHEN frequency > 5 THEN 4 WHEN frequency > 3 THEN 3 WHEN frequency > 2 THEN 2 ELSE 1 END AS frequency_point, CASE WHEN monetary > 100000 THEN 5 WHEN monetary > 10000 THEN 4 WHEN monetary > 1000 THEN 3 WHEN monetary > 500 THEN 2 ELSE 1 END AS monetary_point FROM RFM; ``` ![](https://i.imgur.com/UKO6V8b.png) ### 補充資訊 ![](https://i.imgur.com/p2gfiF8.jpg) ## RFM 顧客分析 III 為了使用上方便,我們將 RFM 分數統計資料建立成一個 View 檢視表: ```text CREATE VIEW RFM_points AS SELECT user_id, recency, monetary, frequency, CASE WHEN recency < 7 THEN 5 WHEN recency < 14 THEN 4 WHEN recency < 30 THEN 3 WHEN recency < 60 THEN 2 ELSE 1 END AS recency_point, CASE WHEN frequency > 10 THEN 5 WHEN frequency > 5 THEN 4 WHEN frequency > 3 THEN 3 WHEN frequency > 2 THEN 2 ELSE 1 END AS frequency_point, CASE WHEN monetary > 100000 THEN 5 WHEN monetary > 10000 THEN 4 WHEN monetary > 1000 THEN 3 WHEN monetary > 500 THEN 2 ELSE 1 END AS monetary_point FROM RFM; ``` 最後將 RFM 分數加總後進行排序,可以得知哪一些是比較重要需要持續關心的 VIP 客戶: ```text SELECT user_id, recency_point, frequency_point, monetary_point, (recency_point + frequency_point + monetary_point) AS RFM_points FROM RFM_points ORDER BY RFM_points DESC; ``` # 延伸閱讀 - [SQL 資料庫](https://hackmd.io/fQbgqRSYTDi2Ouis0fE3rg?both) - [Python 基礎語法](https://hackmd.io/4FH3w4_pQP6_dsandd45LA) - [Python 進階語法使用](https://hackmd.io/QYVYsxE8QyWNsnIhukt-2Q) - [Python 資料分析](https://hackmd.io/C5I9OzXzQCe6wJuZhwRV3Q) - [Python 網頁爬蟲](https://hackmd.io/JOc4g8AjSZiokS6vZckFSw?view) - [Python 資料科學與探索式資料分析](https://hackmd.io/qSceMWZWQcWsMIA9QiO1Nw?view) - [Python專案實作 資料分析與爬蟲](https://hackmd.io/oh18KsFvSxe5Eh3ECHDJOA?view)