# LIDA-ATI DSG July 2021 - Morrisons challenge
## Group members, email addresses and brief profile
* Arkady Wey: arkady.wey@maths.ox.ac.uk
* Brief Profile:
* Jon Ward: j.a.ward@leeds.ac.uk
* Brief Profile:
* Xingna Zhang: xingna.zhang@liverpool.ac.uk
* Brief Profile:
* Festus Adedoyin: fadedoyin@bournemouth.ac.uk
* Brief Profile:
* Alexandru Puiu (Alex): ioan.puiu@maths.ox.ac.uk
* Brief Profile:
* Marko Tesic: m.tesic@bbk.ac.uk
* Brief Profile:
* Sophie Ayling: sophie.ayling.10@ucl.ac.uk
* Brief Profile:
* Idris Zakariyya: i.zakariyya@rgu.ac.uk
* Brief Profile:
* Kingsuk Jana: kingsukjana11@gmail.com
* Brief Profile:
*Daily Task Table*
| Work plan - July 12 2021 Monday |
| -------- |
| Jon - get data access sorted, initial literature review |
| Nina - data description & exploration & visualisation |
| Marko - data exploration & working on a supervised learning model |
| Idris - exploring python module to model the integer programming problem |
| Festus - revisit objectives & potential literature review thematic areas |
| Alex and Arkady - implement the MILP model and get results |
## Work plan 13th July 2021 Tuesday
* Jon (literature review)
* Nina (obtain required constants for the MILP model; quantify variance and bias for every time series)
* Marko (PCA, thinking about differences between stores)
* Idris (identifying data structure in an unsupervised scenario (PCA for feature selection))
* Festus
* Sophie (obtain required constants for the MILP model)
* Alex (could get some help with) possible tasks: 1.review model with the team, 2. make required changes, 3. write standard formulation
* Arkady - MILP implementation, then analysis of MILP results, then exploring reinforcement learning problem (for hopefully at least a week)
* Kingsuk (Thinking about implementation of PCA)
## Work plan 14th of July 2021 Wednessday
* Alex: debug implementation, results
* Marko: finish PCA (add more features), start on bulding an XGBoost model
* Jon: finish literature review, discuss PCA with Idris
* Arkady: debug implementation of milp
* Nina: implement a simple/naive linear and non-linear ML model (waste on forecast)
* Sophie: clean up descriptive stat visualizations, maybe play with scipy optimize
* Festus: support Jon with LR
## Work Plan 15th of July 2021 Thursday
* Alex and Arkady: finish code checks,try relaxations, make things computationally feasible, get some results (hopefully)
* Nina: make the slide for the naive model and see whether I could do any improvement on the modelling
* Jon: single product, store and period `optimisation'
* Marko: finishing boosting modeling and start on k-means clustering
* Festus: chase computation/VM issues; check report writing
## Work Plan 16th of July 2021 Friday
* Sophie and Arkady and Alex: Presentations, results etc
* Marko: add more features to the boosting trees algorithm; generate some plots for the presentation
* Nina: generate a plot for the presentation
## Work Plan 19th of July 2021 Monday
* Nina: 1. covariance between products in terms of sales; 2. the variance of the error forecast; plot the distribution of the error; remove all stocked of zeros; stock of day + delivery <= sale/forecast; 3. look at correlation as well
* Marko: change prediction vars from past forecast to future forecast and compare boosting tree models
* Alex: finalise formulation of RL, start implementation, MILP overleaf report.
* Kingsuk: Studying the impact of forecast has on waste by using Zero-Inflated Negativebinomial Regression.
*https://hackmd.io/ktgcjx9gRzKHA6ef_zHMSw?
*
* Festus: Work with Arkady on report; harmonise report with everyone to have a clear structure.
*
## Work Plan 20th of July 2021 Tuesday
* Nina: look into the forecast error distribution by products and find out what happened to the mean value lines; incorporate day of the week in the models.
## Possible work streams
* Preprocessing the Data:
1. obtain required constants for the MILP model, of particular interest are a. volumes and mases per case of product (case and not unit), b. the lengths of the roads from the depot to the stores
2. quantify variance and bias for every time series corresponding to (p,s), look at the difference and could perform statistical tests to confirm that the residuals are white noise. Could compute the whole covariance matrix - this might be expensive, so maybe marginalise over p and or s.
3. other streams
* Mixed Integer - Linear Program (MILP):
1. review model with the team,
2. make required changes,
3. write standard (matrix) formulation
4. possible relaxations as required
5. implement in Code: Python, R, etc.
6. Review results and discuss.
7. compare results when using sales forecast with using actual sales as inputs 8. iterate
* Machine Learning (ML):
* ML approach to map inputs to waste, and then optimise over the learned function: Linear Approach, Nonlinear: could try SVM, GP, Random Forest, XG Boost, Deep Learning, etc other ideas. Could be used to compare against or get starting solution for MILP. This approach could naturally incorporate variance, and the computational cost might scale better than MILP as the size of the problem grows.
* Could also look at unsupervised techniques to try identify data structure: PCA, clustering etc, and see if there are any particularly enlightening features. This could help with RL or MILP.
* Reinforcement Learning (RL):
* on the fly optimisation: sitting at time t, what is the next optimal move given forecast expected value, forecast variance and operational constraints.
* Note that this problem does not blow up in complexity with the time horison considered, which is an issue with MILP.
* In principle, given the "true" action value function this would give an optimal decision pattern but since the action value function will be learned from the data, this is in principle suboptimal.
* Very nice to compare with MILP since if it performs comparably, this method is much more scalable!
* Could use the MILP to build artificial problems and their optimal solution to properly train a RL algorithm, since the data we have might not be sufficient for proper training. The idea is not necessarily($\star$) to beat the optimial value of MILP since in principle a sufficiently complex MILP would have the true optimum as the global solution, but rather to enable for better scaling. ($\star$): surely, if we cant solve for MILP to optimality RL could potentially give better solutions.
* It would be interesting to see how to combine RL with MILP to get the best of both worlds
* Possible future avenues: extend MILP to incorporate variance, generalisation to sending items between stores, generalisation to multiple depots + other suggestions of extensions
* any other ideas **(maybe Computational charts; A unique solution – optimization with constraints (which we’ve already started with); Machine learning; Smart options? (Maybe); Feature engineering)**
# Contents (??)
## 1 Summary
1.1 Challenge overview
1.2 Data overview
1.3 Main objectives
1.4 Approaches (Model, Methods, Techniques)
1.5 Main conclusions
1.6 Limitations
1.7 Recommendations and future work
## 2 Data overview
2.1 Dataset description
2.2 Data quality
## 3 Data exploration & analysis
Kingsuk:
## 4 Experiments
## 5 Future work and research avenues
## 6 Team members
## References
# 1.2 Data overview
The data subset consists of 55 products across 3 stores. There are 91 days of data in total, with day 0 being a Monday. The first 63 days are included as historical operational data for training, and therefore have the corresponding actual sales and waste data along with depot data. The last 28 days are meant for prediction of the best logistical optimisation.
For null or missing sale forecasts, we impute as zeros. 3.6% of missing values are imputed in this way.
# 1.3 Main Objectives
The principal aim of this challenge is to optimise the flow of products from suppliers through depots to the stores. To achieve this, this report explains how to optimise the transportation of goods to ensure stock is always available for customers, whilst minimising excess product waste and carbon emissions from transportation. Specific objectives are:
1. To **minimise the surplus** and corresponding waste of products in stores whilst **minimising the delivery mileage**
2. To **identify these products** and offer suggestions that could reduce the waste for these products
3. To identify how **changing global system parameters** could improve the overall output:
* the cost of transportation
* the minimum order quantity and
* the accuracy of the forecast
# 1.4 Approaches (Model, Methods, Techniques)
The team has considered two main families of approach:
1. Linear/Integer Programming
2. Reinforcement learning
[Brief description of each]
**Step 1**
After discussion we decided to begin with addressing the simplest version of the problem using a linear programing approach.
Specifically, this involves looking at identifying an algorithm that can meet the objective function of minimizing waste in stores, subject to the 12 logistical supply chain constraints provided in the long description (see below). We decided to first start to calibrate the optimization algorithm on the **historical operation data** of the **63 days period**.
1. For the 28-day period, the accuracy of the sales forecast is in line with the historical data provided.
2. Supply of products to stores should be at minimum equal to expected demand.
3. Products have a minimum order quantity from manufacturing plants and so can only be ordered into the depot in specific quantities. Existing minimum order quantities are provided, but this constraint is encouraged to be relaxed/altered to see if easier or better optimal solutions can be found.
4. The lead time from manufacturing to distribution is assumed to be 3 days, so an order placed on day 1 is received in the depot on day 4.
5. The cost of delivering goods from the manufacturer to distribution depot is assumed to be included in the cost of goods.
6. The cost of processing one case in the depot is 10p. This represents the cost of receiving the case and processing the case for dispatch to the store.
7. One case contains multiple units of the same product. A pallet can contain multiple cases, and does not have to contain all the same product.
8. The cost of shipping one pallet from the distribution depot to a store is 30p per mile.
9. One pallet can hold 1.92 m$^3$ of volume and can hold no more than 1,500 kg.
10. Each product has a minimum life code. For example, a product with a minimum life code of 10 days will be wasted if it is not sold within 10 days of it arriving into the depot.
11. Products must be sent to a store when at least 80% of the code life remains when it reaches the store. For example, if the minimum life code is 10 days it must reach the store with at least 8 days remaining.
12. Stock can be held in the depot for multiple days.
Note
## Definitions
Variables:
* $y_{p,s,t}$ quantity of item $p$ in store $s$ at time $t$. measured in units
* $x_{p,t}$ quantity of item $p$ in the depot at time $t$. Measured in cases
* $o_{p,t}$ orders to arive at depot for product $p$ at time $t$. Measured in cases
* $z_{p,s,t}$ number of cases of $p$ delivered from the depot to store $s$ at time $t$
* $n_{s,t}$ number of pallets going from depot to store $s$ at time $t$
*
Constants:
* $C_{pr}$ case processing cost (10p)
* $CPM$ cost per mile (30p)
* $V$ max volume in pallet
* $m$ max mass per pallet
* $V_p$ volume of case for product p
* $m_p$ mass of case for product p
* $c_p$ production cost of case $p$
* $d_s$ distance from depot to store
* $q_{p}$ number of items per case for product p
* $Q^F_{p,s,t}$ forecast for product p in store t at time t - quantity
Model:
(Objective function (linear) $$
f(x,y,z,o,n)= \sum_{p,t}o_{p,t}{c_p}+ CPM\sum_{s,t}n_{s,t}d_s + C_{pr}\sum_{s,p,t}z_{p,s,t}
$$
)
$$
\min_{x,y,z,o,n} \sum_{p,t}o_{p,t}{c_p} + CPM\sum_{s,t}n_{s,t}d_s + \sum_{s,p,t}(C_{pr}z_{p,s,t}-Q^F_{p,s,t}c_p/q_p)
$$
Constraints
* Amount of stock at depot is what was there yesterday, plus what is ordered in, minus what goes to store,
$$
x_{p,t} = x_{p,t-1} - \sum_s z_{p,s,t}+o_{p,t} \geq 0\\
$$
* Amount of stock in store is what was there yesterday plus what goes to the store, minus what is sold,
$$
y_{p,s,t} = y_{p,s,t-1} + q_{p}z_{p,s,t} - Q^F_{p,s,t}
$$
* What's in the depot, store and what's ordered need to be non-negative,
$$
z_{p,s,t}, o_{p,s,t},y_{p,s,t} \geq 0
$$
* Volume of stock must be less than a whole number of pallets
$$
n_{s,t}V \geq \sum_{p} V_p z_{p,t}
$$
* Mass of pallet must be less than a whole number of maximum pallet mass,
$$
n_{s,t}m \geq \sum_{p} m_p z_{p,t}
$$
* Items, pallets and orders much be integers,
$$
x_{p,t},n_{s,t}, o_{p,t} \in \mathbb{N}
$$
## Mixed Linear Integer Program with Expiry constraints (cleaner notation)
### Variables
* $x_{p,t,\tau}$ is the number of cases of product $p$ born at time $\tau$ that are in the depot at time $t$. Notice that the dimensions are units.
* $y_{p,s,t,\tau}$ is the number of units of product $p$ that were born at time $\tau$ that are in store $s$ at time $t$. Notice that the dimensions are units.
* $u_{p,t}$ is the number of cases of product $p$ coming in to the depot at time $t$. Note that all units in these cases are born at time $\tau=t$. Notice that the dimensions are cases.
* $v_{p,s,t,\tau}$ is the number of cases of product $p$ born at time $\tau$ leaving the depot and arriving at store $s$ at time $t$. Notice that we are assuming that the delivery time is zero. Notice that the dimensions are cases.
* $n_{t,s}$ is the number of pallets leaving the depot and arrivig at store $s$ at time $t$.
* $q^F_{p,s,t,\tau}$ is the number of units of product $p$ that were born at time $\tau$ that are forecasted to sell at time $t$ in store $s$.
### Constants:
* $Q^F_{p,s,t}$ is the forecasted number of sales of product $p$ at store $s$ at time $t$. Note that $\sum_{\tau} q^F_{p,s,t,\tau} = Q^F_{p,s,t}$.
* $T_p$ is the number of timesteps that product $p$ lives. It expires instantaneously at $t=\tau+T_p$.
* $C^{\text{tra}}$ is the cost for a pallet to travel one mile (30p).
* $C^{\text{pro}}$ is the cost of other processing of one case (10p).
* $V^{\text{pal}}$ is the maximum volume of a pallet (1.92m$^3$).
* $M^{\text{pal}}$ is the maximum mass per pallet (1,500kg).
* $C_{p}$ is the cost production of a case of product $p$.
* $V_p$ is the volume of a case of product $p$.
* $M_p$ is the mass of a case of product $p$.
* $D_s$ is the distance from the depot to store $s$.
* $N_{p}$ is the number of units of product $p$ that fit in a case.
### Model
Minimise
$$
\sum_s \sum_p \sum_{t \geq T_p}\bigg(\frac{C_p}{N_p}\bigg)y_{p,s,t,t-T_p}
+\sum_{s}\sum_{p}\sum_{\tau > T-T_p}\sum_{t\geq \tau} \bigg(\frac{C_p}{N_p}\bigg) y_{p,s,t,\tau}\\ + C^{\text{tra}} \sum_{s,t}n_{s,t}D_s + C^{\text{pro}} \sum_{s,p,t,\tau}v_{p,s,t, \tau}
$$
* We assume that $\tau \geq 0$, so that all products that were really created before or at $t=0$ are created at time $0$ so that $\tau=0$ for these products.
* We note that, strictly speaking, for the second term, $\tau\geq max(0, T-T_p)$ but since $T$ is expected to be greater than $T_p$ we usually have $\tau > T-T_p$)
### Constraints:
* The following variables are greater than zero.
$$\tau \geq 0,
\\
q^F_{p,s,t,\tau} \geq 0. \hspace{5mm} \forall p,s,\tau, t \in [\tau,\tau+1,...,\tau+T_p] $$
$$
\sum_\tau q^F_{p,s,t,\tau} = Q^F_{p,s,t} \hspace{5mm} \forall \quad p,s,\tau, t \in [\tau,\tau+1,...,\tau+T_p]
$$
$$
y_{p,s,t,\tau},v_{p,s,t,\tau},x_{p,t,\tau}, n_{s,t}, u_{p,t,\tau} \geq 0
$$
#### Stuff at depot
* Conservation at depot. For each product, for each born date $\tau$, the amount of stock at depot is what was there yesterday, plus what comes in, minus what goes out to all stores.
$$
x_{p,t,\tau} = x_{p,t-1,\tau} - \sum_s v_{p,s,t,\tau} + \delta_{t,\tau} u_{p,\tau} \hspace{5mm} t \geq 1, \tau \geq 0 \hspace{5mm} t\geq \tau \hspace{20mm} ; (1)
$$
* Products can not last in store for more than 20% of their lifetime ($T_p$). So call $\delta_p = [0.2T_p]$. Then
$$
x_{p,t,\tau} =0 \hspace{5mm} \text{for} \hspace{5mm} t > \tau + \delta_p \hspace{20mm} (2)
$$
* We know the amount of stuff in the depot initially, and in the store.
$$x_{p,0,0} , y_{p,s,0,0} \hspace{5mm} \text{are known for every } p \text{ and } s$$
Notice that, combining (1) and (2) gives
$$
x_{p,t,\tau} = x_{p,t-1,\tau} - \sum_s v_{p,s,t,\tau} + \delta_{t,\tau}u_{p,\tau} \hspace{5mm} max(1,\tau) \leq t \leq \tau + \delta_p
$$
$$
0 = x_{p,t-1,\tau} - \sum_s v_{p,s,t,\tau} + \delta_{t,\tau}u_{p,\tau} \hspace{5mm} t =\tau + \delta_p +1
$$
#### Stuff at store
* Conservation at store. For each product, for each born date $\tau$, the amount of stock at store is what was there yesterday, plus what comes in, minus what goes out due to forecasted sales.
$$
y_{p,s,t,\tau} = y_{p,s,t-1,\tau} + \delta_{t \leq \tau +\delta_p}N_p v_{p,s,t,\tau} - q^F_{p,s,t,\tau}
\quad \tau +1 \leq t \leq \tau+T_p
$$
**Note need to sort these constraints out**
* Volume of stock must be less than a whole number of pallets
$$
n_{s,t} V^{\text{pal}} \geq \sum_{p}\sum_{t-\delta_p\leq \tau\leq t} V_p v_{p,t,\tau}
$$
* Mass of pallet must be less than a whole number of maximum pallet mass,
$$
n_{s,t} M^{\text{pal}} \geq \sum_{p}\sum_{t-\delta_p\leq \tau\leq t} M_p v_{p,t,\tau}
$$
alternatively can write
$$
\tau \leq t \leq\tau+\delta_p; \hspace{5mm} t-\delta_p\leq \tau\leq t
$$
* Items, pallets and orders much be integers,
$$
y_{p,s,t,\tau},v_{p,s,t,\tau},x_{p,t,\tau}, n_{s,t}, u_{p,t,\tau} \in \mathbb{N}
$$
## Complexity
We have four variables that are indexed by $p,s,t,\tau$ namely $x_.,y_.,v_.,q^F_.$ If the number of days considered is $T$, number of products is $P$, number of stores $S$, and $\delta_p$ and $T_p$ as before. The total number of variables of problem are then:
$$
size = 2(\delta_p+2T_p)PST +P*S+S*T = 2*12*55*3*63 + 55*63+3*63 = 253,134
$$
which is clearly very large. This may mean the problem takes O(days) to solve.
We may make some simplifations to make this problem smaller, as follows.
## Bounds
Let $\alpha>1$ upper bound on max quantity in store. For example, $\alpha=2$.
$$
Y_p = \alpha \max_{s,t}(Q^F_{p,s,t})
$$
Note that once we see the solution, this will help us to choose $\alpha$.
Bound on $v_p$:
$$
\frac{Y_p}{N_p}.
$$
Bound on $q^F_.$
$$
Q^F_.
$$
Bound on $x_.$ depot:
$$
\delta_pSY_p/N_p
$$
Bound on $u_.$ is the same as the depot bound.
Bound on $n$ is
$$
max(Y_p/N_p*V_p/V^{\text{pal}}, Y_p/N_p*M_p/M^{\text{pal}})
$$
This is the upper bound on the number of cases multiplied by the volume of a case which is the max volume. Divide by the pallete volume and we have the max number of pallets.
## Background
A highly simplified version of the problem is called the ``newsvendor'' problem, which has an analytical solution:
https://en.wikipedia.org/wiki/Newsvendor_model
# Simple optimisation
Consider a single product (item) in a single store. One strategy is to deliver enough product to last until its shelf life (period), after which any remaining stock is wasted and a new delivery for the next period. Thus the problem becomes a series of single period news vendor problems, so we can consider how to optimise a single period. Define the following variables:
* $v$, the number of items delivered in a period, which assume is a multiple of cases;
* $Q$, the forecasted number of items that will be sold;
* $D$, the actual number of items sold.
Define the following parameters:
* $C$, the cost of producing one case;
* $C^{\rm pro}$, the cost of processing one case;
* $C^{\rm tra}$, the cost of transporting one pallet per mile;
* $D_{\rm s}$, the distance from the store to the depot in miles;
* $N$, the number of items per case;
* $N_{\rm pal}$, the number of pallets per case.
The cost function is therefore
$$c(v,Q)=\frac{C}{N}\left|v-Q\right|+C^{\rm pro}\frac{v}{N}+C^{\rm tra}D_{\rm s}{\rm ceil}\left(\frac{v}{NN_{\rm pal}}\right).$$
The first term in the cost function, $\frac{C}{N}\left|v-Q\right|$, is the cost of producing stock above or below the forecasted demand. Here we are assuming that the cost of producing one unit is the same as the store price, since we were only given production costs. Thus ordering one too many cases costs the same as ordering one too few, which would be lost revenue. Note that this assumption could effect our results, although we suspect that it will only increase or decrease the optimal number of cases by one. Here we are costing the number of units wasted, where the cost of wasting one unit is $\frac{C}{N}$.
The second term, $C^{\rm pro}\frac{v}{N}$, describes the processing cost, where ${\rm ceil}\left(\frac{v}{N}\right)$ is the number of cases processed. The third term, $C^{\rm tra}D_{\rm s}{\rm ceil}\left(\frac{v}{NN_{\rm pal}}\right)$, is the transport cost, where ${\rm ceil}\left(\frac{v}{NN_{\rm pal}}\right)$ is the number of pallets transported.
If $Q$ is an integer number of cases, then that is the optimal number of cases to send to the store, since the production cost is zero when supply equals demand and any production above or below demand increases cost.
The processing and transport costs are both non-decreasing functions, so if these were the only costs, then it would also be less costly to reduce the number of cases ordered.
The cost $c(v,Q)$ is the cost given the forecast, i.e. the expected cost.
not an integer number of
Results with case cost in pence (which is wrong):




83 optimal-delivered sales cost <0
82 optimal-delivered sales cost >0

* Not accounting for minimum order quantity (from supplier), although could be distributed across stores.
Optimal forecast underestimates total cost by 0.028x10^5 (about 2.5%)
Delivered forecast overestimates total cost by 0.018x10^5 (about 1.5%)
------
## Some basics of hackmd:
*Equations can be written as LaTeX...*
$$\frac{\partial {\bf u}}{\partial t} + \left({\bf u}\cdot\nabla\right){\bf u} - \nu \nabla^2 {\bf u} = -\nabla w + {\bf g}$$
or simply $x^2 + y^2 = z^2$
*Tables are simpler forms:*
| Column 1 | Column 2 | Column 3 |
| -------- | -------- | -------- |
| Text | Text | Text |
You can use <font color=blue> colour </font> if you want <font color=red> emphasis </font>
and
## Add titles
and
page breaks
------
*Drag and drop images:*

* Bullet 1
* Bullet 2