# EUShop Intelligence Project - Report 1. sales trend of all products (2716) between Mar-2022 to July-2022. ![](https://i.imgur.com/ETM8tTR.png) there are 3 classes: one has wide range, one has flat range, and the last one is in the middle situation. 2. K-means with DTW metric to auto-cluster time series. ![](https://i.imgur.com/zYryaQD.png) | class | product id | mampe | | -----|------------| ------| | 0 | 7876 | zero sales in Aug | | 0 | 994 | 1.8815091689006407 | | 0 | 4455 | 7.94163789051627 | | class | product id | mampe | | -----|------------| ------| | 1 | 5421 | 1.4169936512540262| | 1 | 3214 |2.1209650048242286 | | 1 | 869 | 1.7925567882674824 | |class| product id | mampe | |-----|------------| -----------------| | 2 | 1293 | 0.4922285835914211 | | 2 | 1332 | 1.1559538503071034 | | 2 | 1348 | zero sales in Aug | ~2563 ~5 ~148 ![](https://i.imgur.com/hQlrptx.png) ![](https://i.imgur.com/Bz72LEI.png) ![](https://i.imgur.com/vVTFuxe.png) 3. product_id (std, mean) over Mar-July, scatter for cluster (more details in .html file) ![](https://i.imgur.com/1fbVY3k.png) ![](https://i.imgur.com/gnVVlaM.png) 4. K-means with DTW in sale (=quantity x price_eur) ![](https://i.imgur.com/v9pGRp4.png) | class | product id | mampe | | -----|------------| ------| | 0 | 3117 | 0.8698 | | 0 | 9062 | 1.3345 | | 0 | 9387 | 1 | | class | product id | mampe | | -----|------------| ------| | 1 | 1326 | 0.5903| | 1 | 8711 |2.5457 | | 1 | 1361 | 1.0863 | |class| product id | mampe | |-----|------------| -----------------| | 2 | 3172 | 0.6214| | 2 | 4217 | 1.1058 | | 2 | 6062 | 0.9129 | ~2500 ~77 ~20 5. actural data points vs predicted data points 5.1 1293 - 105 ![](https://i.imgur.com/h8Le2MT.png) ![](https://i.imgur.com/jtBWwyD.png) 5.2 1326 - 105 ![](https://i.imgur.com/P1kxoi3.png) ![](https://i.imgur.com/cDxk9qg.png) 6. generated by algorithm vs manully(average 7, 14, one month) (need to ask colleagues) ![](https://i.imgur.com/TniZMMp.png) ![](https://i.imgur.com/jT4LCd0.png) # Report-A In this AI-project, we aim to provide more reasonable and sustainable procurement suggestions by using data-driven technology. Our teachnologies will include forecasting and decision-making. Currently, we have a successful application of prediction model in our specific task. In this task, the focus is on an optimal prediction model for predicting future sales of each product. In the past, the future sale is calcualted by Moving Average (MA) method. $y_t = \frac{x_t + x_{t-1} + x_{t-2} + ... + x_{win-t-1}}{win}$ Where $t$ represents the time stamp and $win$ the size of a sliding window. For example, the sale in the tommorow is the avergae of the last $win$ days. Here, $win$ could be 7 days, 14 days and 30 days. But now, we introduce the data-driven technology - Machine Learning (ML). In our task, we are using the Facbook Prophet (FB) prediction model. This model especially performs well in dealing with promotion/holiday events that commonly happens in the retail area. $y(t) = g(t) + s(t) + h(t) + e_t$ where growth$g(t)$, seasonality $s(t)$, holidays $h(t)$ and error $e_t$. Here are two examples, and we can find the prediction difference among these four motheds. In product-1293 and product-1326, the FB model (see red-predicted line) can be much closer to the actual sale (see the blue-actual line) than other three common MA models(see green-win_7, purpel-win_14 and orange-win_30 lines). Especially, when the actual sales reach peaks and valleys, the FB model can detect these situations very well, which will help us make more reasonable decisions for the future procurement. ![](https://i.imgur.com/TniZMMp.png) ![](https://i.imgur.com/jT4LCd0.png) There are around 2500 products from March-first to September-30 this year. According to our analysis, FB model has the cloest performance to the actual sale. | Method | Sales (€) | | -------- | -------- | | actual | ≈ 114,892 | | FB | ≈ 125,620 | | win7 | ≈ 148,225 | | win14 | ≈ 137,355 | | win30 | ≈ 129,518 | In the future, we will improve the FB model performance by adding promotion/holiday events factor, and we plan to assign different FB model to different products. # Report B In this phase, the focus is on the decision-making. In the Report A, we concluded that FB model has more flexibility and higher accuracy in forecasting the future quantities/sales of each product. For example, adding promotion factors in FB model apprentately help improve the forecasting accuracy. Therefore, the predicted quantities/sales derived from the FB model will be used in this phase for operation research. The objective problem is formulated as $f(X)=\sum_{i}^{n}(s_i - c_i)x_i$ s.t. $\sum_{i=1}^{n}c_ix_i \leq w$ $x_i \in Z$ $x_i \leq R_i-N_i$ $i=1,2,...,n$ where $n$ is the number of products $s_i$ is the selling price of product $i$ $N_i$ is the predicted number of product $i$ that will be sold $c_i$ is the cost price of product $i$ $x_i$ is the number of product $i$ that may be procured. $R_i$ is the current stock at the date of the end of past time series. For example, considering the supplier 'Herman kuijper', there are 152 valid products ($n=152$), and the budget is 10,000 Euro ($w=10000$). After solving the objective problem with required constraints, we conclude that 90 products should be procured with corresponding quantity, and the maximum profit will be around 1793 Euro. An exmaple is shown as follow, and check out.xlsx for details. | product_id | name | num | | -------- | -------- | -------- | |1026 |葱伴侣大酱 |24| |1098 |台湾福成关庙面| 5| |1104 |台湾亲亲阿萨姆奶茶| 5| |1105 |台湾亲亲柠檬红茶 |4| |1108 |台湾亲亲蜂蜜绿茶 |1| |1170 |黄飞红麻辣花生 |39| # Report-C In this phase, the stock is considered for the decision-making. In this specific case, the stock info is from 2022-11-11.xlsx, and the historical data is considered for forecasting the future 10 days sales. The objective problem which will be maximized is formulated as $f(X)=\sum_{i}^{n}(s_i - c_i)x_i$ s.t. $\sum_{i=1}^{n}c_ix_i \leq w$ $x_i \in Z$ $x_i \leq N_i - R_i$ $i=1,2,...,n$ where $n$ is the number of products $s_i$ is the selling price of product $i$ $N_i$ is the predicted number of product $i$ that will be sold $c_i$ is the cost price of product $i$ $x_i$ is the number of product $i$ that may be procured. $R_i$ is the current stock at the date of the end of past time series. For example, considering the supplier 'Herman kuijper', there are 152 valid products ($n=152$), and the budget is 10,000 Euro ($w=10000$). Meanwhil, the products will be ignored if there are enough stock for selling in the next 10 days. Five time-period historical datasets are considered for forecasting, and the maximum profits are concluded as the following table. | method | 2022/05/01 to 2022/11/11 | 2022/06/01 to 2022/11/11 | 2022/07/01 to 2022/11/11 | 2022/08/01 to 2022/11/11 | 2022/09/01 to 2022/11/11 | | -------- | -------- | -------- | --------| --------| --------| | fb | 1097.11 | 1167.04 | 1266.87 | 1369.32 | 1207.74 | | win7 | 1009.89 | 1009.89 | 1009.89 | 1009.89 | 1009.89 | | win14 | 1155.11 | 1155.11 | 1155.11 | 1155.11 | 1155.11 | | win30 | 756 | 756.37 | 756.37 | 756.37 | 756.37 | Therefore, we conclude that the past 3 or 4 months are better used as the historical dataset for forecasting, and the FB model can perform better than other three methods in maximizing the profit and the flexibility to use. Details about which products with the quantity can be found in out_fb_pre.xlsx file. # Report-D In this phase, the number of products in one box (box number, $p_i$) is considered for the decision-making. Furthermore, the updated objective problem is formualated as: $f(x)=\sum_{i}^{n}(s_i - c_i)x_ip_i$ s.t. $\sum_{i=1}^{n}c_ix_ip_i \leq w$ $x_i \in Z$ $x_i \leq \frac{N_i - R_i}{p_i}$ $i=1,2,...,n$ where $n$ is the number of products $s_i$ is the selling price of product $i$ $N_i$ is the predicted number of product $i$ that will be sold $c_i$ is the cost price of product $i$ $x_i$ is the **box number** of product $i$ that may be procured. $p_i$ is the pieces number of product $i$ that may be procured. $R_i$ is the current stock at the date of the end of past time series. Therefore, when there is only one supplier 'Herman kuijper' and the budget is 10,000 Euro ($w=10000$), the maximum profits optimized by Integer Programming would be as shown in the following table. $g(x)=\sum_{i}^{n}(s_i - c_i)z_i$ $z_i = min(x_ip_i, X_i-R_i)$ where $n$ is the number of products $s_i$ is the selling price of product $i$ $c_i$ is the cost price of product $i$ $x_i$ is the **box number** of product $i$ generated by $f(x)$. $p_i$ is the pieces number of product $i$ that may be procured. $R_i$ is the current stock at the date of the end of past time series. $X_i$ is the actual total number of product $i$ that have been sold. | methods | $f(x)$ | $g(x)$ | | -------- | -------- | -------- | | fb | 1224.11 | 163.17 | | win7 | 883.53 | 236.75 | | win14 | 1010.67 | 162.43 | | win30 | 1179.26 | 212.50 | ==================== Instance ID -Open an SSH client. -Locate your private key file. The key used to launch this instance is eushop.pem -Run this command, if necessary, to ensure your key is not publicly viewable. chmod 400 eushop.pem -Connect to your instance using its Public DNS: ec2-54-152-190-48.compute-1.amazonaws.com Example: ssh -i "eushop.pem" ubuntu@ec2-54-152-190-48.compute-1.amazonaws.com https://www.digitalocean.com/community/tutorials/how-to-install-postgresql-on-ubuntu-20-04-quickstart ======================