# Adaptive Market Planning
## Introduction
Adaptive market planning addresses problems involving resource allocation under uncertain demand. A classic example is stocking perishable inventory, like fresh fish, where unsold stock cannot be carried over.
## The Newsvendor Problem
The newsvendor problem, a cornerstone in optimization under uncertainty, is formulated as:
$$
\max_{x} \mathbb{E}F(x,W) = \mathbb{E}\left[p\min\{x,W\} - cx \right]
$$
where:
* $x$: Decision variable (order quantity)
* $W$: Uncertain demand
* $p$: Selling price
* $c$: Unit cost ($p>c$)
When demand $W$ is known, the optimal solution is simply $x=W$. However, under uncertainty, where $W$ is a random variable with distribution $f^W(w)$, finding the optimal order quantity becomes more complex.
## Modeling Adaptive Market Planning
**Objective:** Maximize net income from resource allocation under uncertain demand.
### Mathematical Model
* **State Variables:** $S_n = x_n$ (order quantity)
* **Decision Variables:** $\alpha_n$ (step size)
* **Exogenous Information:** $W_{n+1}$ (random demand)
* **Transition Function:** Updates order quantity using stochastic gradient and projection.
$$
x^{n+1} = \max\{0, \lfloor x^{n} + \alpha_{n} \nabla_x F(x^{n}, W^{n+1})\rfloor\}
$$
* **Objective Function:** Calculates net profit at each iteration: $$F(x_{n}, W_{n+1}) = p\min\{x^{n}, W_{n+1}\} - cx_{n}$$
### Uncertainty Modeling
To model demand uncertainty in the newsvendor problem, we assume that the demand $W$ follows a Poisson distribution with unknown mean $\mu$:
$$
f^{W}(w) = \frac{\mu^{w}e^{-\mu}}{w!},\quad w = 0,1,2,\dots
$$
While we assume a specific distribution for $W$, the mean demand $\mu$ is unknown but follows a known discrete distribution:
$$
p_{k}^{\mu} = \text{Prob}[\mu = \mu_{k}], \quad k=1,2,\dots, K
$$
This distribution $p^{\mu} = (p^{\mu}_{k})_{k = 1}^{K}$ is incorporated into our initial state $S^0$, reflecting the prior knowledge we have about the possible values of the mean demand.
Here we set $K = 1$ and $\mu_{k} = 100$.
### Implementation
```python=
class AdaptiveMarketPlanningModel():
"""
Base class for model
"""
def __init__(self, state_names, x_names, s_0, T, reward_type, price=1.0, cost=1.0, exog_info_fn=None, transition_fn=None, objective_fn=None, seed=20180613):
self.init_args = {seed: seed}
self.prng = np.random.RandomState(seed)
self.init_state = s_0
self.T = T
self.reward_type = reward_type
self.state_names = state_names
self.x_names = x_names
self.State = namedtuple('State', state_names)
self.state = self.build_state(s_0)
self.Decision = namedtuple('Decision', x_names)
self.obj = [] if reward_type == 'Cumulative' else 0.0
self.past_derivative = 0.0
self.cost = cost
self.price = price
self.t = 0
self.learning_list = []
# this function gives a state containing all the state information needed
def build_state(self, info):
return self.State(*[info[k] for k in self.state_names])
# this function gives a decision
def build_decision(self, info):
return self.Decision(*[info[k] for k in self.x_names])
def exog_info_fn(self, decision):
# return new demand based on a given distribution
return {"demand": self.prng.poisson(100)}
# this function takes in the decision and exogenous information to return
# this function gives the exogenous information that is dependent on a random process
# computes the f_hat, chnage in the forecast over the horizon
# new state
def transition_fn(self, decision, exog_info):
self.learning_list.append(self.state.order_quantity)
# compute derivative
derivative = self.price - self.cost if self.state.order_quantity < exog_info['demand'] else -self.cost
# update order quantity
new_order_quantity = max(0, int(self.state.order_quantity + decision.step_size * derivative))
# count number of times derivative changes sign
new_counter = self.state.counter + 1 if self.past_derivative * derivative < 0 else self.state.counter
self.past_derivative = derivative
return {"order_quantity": new_order_quantity, "counter": new_counter}
# this function calculates how much money we make
def objective_fn(self, decision, exog_info):
self.order_quantity = self.state.order_quantity
obj_part = self.price * min(self.order_quantity, exog_info['demand']) - self.cost * self.state.order_quantity
return obj_part
# this method steps the process forward by one time increment by updating the sum of the contributions, the
# exogenous information and the state variable
def step(self, decision):
self.t_update()
exog_info = self.exog_info_fn(decision)
onestep_contribution = self.objective_fn(decision, exog_info)
#Check if cumulative or terminal reward
if (self.reward_type == 'Cumulative'):
self.obj.append(onestep_contribution)
else:
if (self.t == self.T):
self.obj = onestep_contribution
transition_info = self.transition_fn(decision, exog_info)
self.state = self.build_state(transition_info)
# Update method for time counter
def t_update(self):
self.t += 1
return self.t
```
## Stochastic Gradient Algorithm
In the newsvendor problem, when the demand distribution $W$ is known, the optimal order quantity $x^*$ can be determined using the cumulative distribution function (CDF) of the demand:
$$
F^{W}(x^{\ast}) = \frac{p-c}{p}.
$$
where
* $F^W(w)$ is the CDF of the demand
* $p$ is the selling price
* $c$ is the unit cost
However, in real-world scenarios, the true demand distribution is often unknown, making it impossible to directly apply this formula.
To address this uncertainty, we utilize the stochastic gradient algorithm. This iterative method updates the order quantity $x^n$ at each time step $n$ based on the observed demand $W^{n+1}$ and a step size $\alpha_n$:
$$
x^{n+1} = \max\{0, \lfloor x^{n} + \alpha_{n} \nabla_x F(x^{n}, W^{n+1}) \rfloor \}
$$
where
* $\nabla_x F(x^{n}, W^{n+1})$ ) is the gradient of the profit function with respect to the order quantity
* The $\max$ and $\lfloor \rfloor$ operators ensure that the order quantity remains non-negative and integer
### Conditions for Convergence
For the algorithm to converge to the optimal order quantity, the step sizes $\alpha_{n}$ must satisfy the following conditions:
* $\alpha_{n} > 0$ (positive step sizes)
* $\sum_{n = 1}^{\infty}\alpha_{n} = \infty$ (step sizes don't decrease too quickly)
* $\sum_{n = 1}^{\infty}(\alpha_{n})^{2} < \infty$ (step sizes decrease sufficiently)
### Handling Constraints
Order quantities may be subject to constraints, such as being non-negative or having a limited inventory capacity. These constraints are incorporated directly into the update equation by using the projection operator:
* The $\max\{0, \dots\}$ ensures non-negativity
* The $\lfloor \dots \rfloor$ ensures the order quantity is an integer

### Designing Step Size Policies
The step size $\alpha_n$ significantly impacts the algorithm's convergence speed and accuracy. Two common step size policies are:
#### Harmonic Stepsize Rule
This deterministic policy predefines the step size for each iteration:
$$
\alpha^{harmonic}(S^{n}|\theta^{step}) = \frac{\theta^{step}}{\theta^{step} + n - 1}
$$
The parameter $\theta^{step}$ is a parameter controlling the initial step size and the rate of decrease.
#### Kesten's Rule
This adaptive policy adjusts the step size based on the gradient's behavior:
$$
\alpha^{kesten}(S^{n}|\theta^{step}) = \frac{\theta^{step}}{\theta^{step} + K^n- 1}
$$
where $K^n$ counts how many times the gradient changes direction:
$$
K^{n+1} =
\begin{cases}
K^{n} + 1 & \text{if } (\nabla^{x} F(x^{n}, W^{n+1}))^{\top}\nabla^{x} F(x^{n-1}, W^{n}) < 0\\
K^{n} & \text{otherwise}
\end{cases}
$$
A high count of $K^n$ indicates oscillation around the optimum, prompting a smaller step size to refine the search.
To implement Kesten's rule, we track the sign changes of the gradient and update the counter $K^n$ accordingly. This requires augmenting the state variable to include $K^n$:
$$
S^{n} = (x^{n}, K^{n})
$$
### Implementation
```python=
class AdaptiveMarketPlanningPolicy():
"""
Base class for policy
"""
def __init__(self, AdaptiveMarketPlanningModel, theta_step):
"""
Initializes the model
:param AdaptiveMarketPlanningModel: AdaptiveMarketPlanningModel - model to construct decision for
:param theta_step: float - theta step variable
"""
self.M = AdaptiveMarketPlanningModel
self.theta_step = theta_step
# returns decision based on harmonic step size policy
def harmonic_rule(self):
return self.M.build_decision({'step_size': self.theta_step / (self.theta_step + self.M.t - 1)})
# returns decision based on Kesten's rule policy
def kesten_rule(self):
return self.M.build_decision({'step_size': self.theta_step / (self.theta_step + self.M.state.counter - 1)})
# returns decision based on a constant rule policy
def constant_rule(self):
return self.M.build_decision({'step_size': self.theta_step})
# returns decision based on a constant rule policy
def run_policy(self):
model_copy = deepcopy(self.M)
for t in range(model_copy.T):
model_copy.step(AdaptiveMarketPlanningPolicy(model_copy, self.theta_step).kesten_rule())
return (model_copy.obj, model_copy.learning_list)
```
### Experimental Evaluation
#### Scenario Setup
We evaluate the performance of the stochastic gradient algorithm with Kesten's rule step size policy in the context of the newsvendor problem. The specific scenario parameters are as follows:
* Price $p$: 26
* Cost $c$: 20
* Trial size $K$: 1000
* Step $T$: 800
* Step Size Policy: Kesten's Rule
* Evaluation Metric: Cumulative Reward
* Policy Parameter Set $\theta^{step}$: $\{10, 20, \dots, 100\}$
The objective is to identify the optimal value of the policy parameter $\theta^{step}$ that maximizes the cumulative reward, representing the total profit over the simulation period.
### Optimization Procedure
To determine the best policy, we conduct multiple simulations for each value of $\theta^{step}$ in the policy set. Within each simulation, we run the stochastic gradient algorithm for the specified number of time steps ($T = 800$). The order quantity is updated at each step based on the observed demand and Kesten's rule.
The cumulative reward for each simulation is calculated as the sum of profits over all time steps. The average cumulative reward is then computed across all trials ($K = 1000$) for each value of $\theta^{step}$.
$$
\max_{\pi}\frac{1}{K}\sum_{k = 1}^{K}\sum_{t = 0}^{T-1}F(X^{\pi}(S_{t}), W_{k}^{t + 1}).
$$
The optimal policy is identified as the one that yields the highest average cumulative reward.
#### Implementation in Python
```python=
# Adaptive Market Planning Driver Script
state_names = ['order_quantity', 'counter']
init_state = {'order_quantity': 0, 'counter': 0}
decision_names = ['step_size']
cost = 20
trial_size = 1000
price = 26
theta_step = np.linspace(10,100,10)
T = 8000
reward_type = 'Cumulative'
reward_per_policy = []
fig1, pic_first= plt.subplots()
fig2, pic_last =t= plt.subplots()
for i in range(len(theta_step)):
M = AdaptiveMarketPlanningModel(state_names, decision_names, init_state, T,reward_type, price, cost)
P = AdaptiveMarketPlanningPolicy(M, theta_step[i])
learning_list_per_iteration = np.zeros(T)
rewards_per_iteration = []
for ite in list(range(trial_size)):
reward,learning_list = P.run_policy(ite)
rewards_per_iteration.append(reward)
reward = np.array(rewards_per_iteration).sum(axis = 0)/trial_size
pic_first.plot(range(100), reward[0:100], label = theta_step[i])
pic_last.plot(range(T-100, T), reward[T-100:T], label = theta_step[i])
reward_per_policy.append(reward.sum())
print("the best policy is", theta_step[np.argmax(reward_per_policy)])
pic_first.legend()
pic_first.set_xlabel('Iteration')
pic_first.set_ylabel('Objective function')
pic_last.legend()
pic_last.set_xlabel('Iteration')
pic_last.set_ylabel('Objective function')
```
```
The best policy is 100
```


## Parametric Model for Adaptive Market Planning
We can extend the adaptive market planning model by introducing a parametric representation of the order quantity. This allows us to model the relationship between order quantity and price more flexibly. We define the order quantity $x^{\pi,N}$ as a parametric function of the price $p$ and a parameter vector $\theta$:
$$
x^{\pi,N}(p|\theta) = \theta_{0} + \theta_{1}p + \theta_{2}p^{\theta_{3}}.
$$
In this model, the state variable is augmented to include the parameter vector $\theta$ along with the gradient change counter $K^n$ and the current price $p_n$:
$$
S_{n} = \{K_{n}, p_{n}, \theta \},
$$
where $\theta = (\theta_0, \theta_1, \theta_2, \theta_3)$
The price process $p_n$ is modeled as a random walk with reflecting boundaries:
$$
p_{n+1} =\min\{p^{\text{high}},\max\{p^{\text{low}}, p_{n} + \delta \}\},
$$
where:
* $\delta$ is a random variable with $\text{Prob}(\delta=1)=0.2$, $\text{Prob}(\delta=−1)=0.2$, and $\text{Prob}(\delta=0)=0.6$.
* $p^{\text{low}}$ and $p^{\text{high}}$ are deterministic lower and upper bounds on the price.
The transition function for the parameter vector $\theta$ is defined as:
$$
\theta_{n+1} = \theta_{n} + \alpha_{n}\nabla^{x} F(x^{n}, p_{n+1}),
$$
### Implementation in Python
```python=
class ParametricModel(AdaptiveMarketPlanningModel):
"""
Subclass for Adaptive Market Planning
"""
def __init__(self, state_names, x_names, s_0, T, reward_type, cost = 1.0, price_low=1.0, price_high=10.0, exog_info_fn=None, transition_fn=None, objective_fn=None, seed=20180613):
"""
Initializes the model
See Adaptive Market Planning Model for more details
"""
super().__init__(state_names, x_names, s_0, T, reward_type, cost=cost, exog_info_fn=exog_info_fn, transition_fn=transition_fn, objective_fn=objective_fn, seed=seed)
self.past_derivative = np.array([0, 0, 0, 0])
self.low = price_low
self.high = price_high
# returns order quantity for a given price and theta vector
def order_quantity_fn(self, price, theta):
price = float(price)
return max(0, int(theta[0] + theta[1] * price + theta[2] * (price ** (theta[3]))))
# returns derivative for a given price and theta vector
def derivative_fn(self, price, theta):
price = float(price)
return np.array([1, price, price ** (theta[3]), theta[2] * (price ** (theta[3])) * math.log(price)])
# this function takes in the decision and exogenous information to return
# new state
def transition_fn(self, decision, exog_info):
self.learning_list.append(self.state.theta)
# Compute derivative and update theta
derivative = np.array([0, 0, 0, 0])
if self.order_quantity_fn(self.state.price, self.state.theta) < exog_info['demand']:
derivative = (self.state.price - self.cost) * self.derivative_fn(self.state.price, self.state.theta)
else:
derivative = (-self.cost) * self.derivative_fn(self.state.price, self.state.theta)
new_theta = self.state.theta + decision.step_size * derivative
new_counter = self.state.counter + 1 if np.dot(self.past_derivative, derivative) < 0 else self.state.counter
self.past_derivative = derivative
# Generate random price change
delta = self.prng.choice([-1, 0, 1], p=[0.2, 0.6, 0.2]) # Simplified random walk logic
new_price = min(self.high, max(self.low, self.state.price + delta))
return {"counter": new_counter, "price": new_price, "theta": new_theta}
# this function calculates how much money we make
def objective_fn(self, decision, exog_info):
self.price = self.state.price
self.order_quantity = self.order_quantity_fn(self.state.price, self.state.theta) # Assign to self.order_quantity
# Correctly refer to self.order_quantity
obj_part = self.state.price * min(self.order_quantity, exog_info['demand']) - self.cost * self.order_quantity
return obj_part
```
## Experimental Evaluation
We evaluate the parametric model under the following conditions:
* $p^{\text{low}}$: 20
* $p^{\text{high}}$: 45
* Trial Size: 1000
* Time Steps: 8000
* Policy: Kesten's Rule
* Evaluation: Cumulative Reward
* Policy Parameter Set $\theta^{step}$: $\{2, 3, \dots, 10\}$
```python=+
# Define state variables
state_names = ['counter', 'price', 'theta']
init_state = {'counter': 0, 'price': 26, 'theta': np.array([0.1, 0.1, 0.1, -2])}
decision_names = ['step_size']
# Read variables from Excel file
cost = 20
trial_size = 1000
price_low = 20
price_high = 45
theta_step = np.linspace(2,10,9)
T = 8000
reward_type = 'Cumulative'
reward_per_policy = []
# Initialize model and run simulations
fig1, pic_first= plt.subplots()
fig2, pic_last =t= plt.subplots()
for i in range(len(theta_step)):
M = ParametricModel(state_names, decision_names, init_state, T, reward_type, cost, price_low = price_low, price_high = price_high)
P = AdaptiveMarketPlanningPolicy(M, theta_step[i])
learning_list_per_iteration = np.zeros(T)
rewards_per_iteration = []
for ite in list(range(trial_size)):
reward,learning_list = P.run_policy(ite)
rewards_per_iteration.append(reward)
reward = np.array(rewards_per_iteration).sum(axis = 0)/trial_size
pic_first.plot(range(T), reward[0:T], label = theta_step[i])
pic_last.plot(range(T-100, T), reward[T-100:T], label = theta_step[i])
reward_per_policy.append(reward.sum())
print("the best policy is", theta_step[np.argmax(reward_per_policy)])
pic_first.legend()
pic_first.set_xlabel('Iteration')
pic_first.set_ylabel('Objective function')
pic_last.legend()
pic_last.set_xlabel('Iteration')
pic_last.set_ylabel('Objective function')
```
```
The best policy is 2
```


## Reference
* Chap 3 of Warren B. Powell (2022), Sequential Decision Analytics and Modeling: Modeling with Python [link](https://castle.princeton.edu/sdamodeling/)
* https://github.com/djanka2/stochastic-optimization/tree/master/AdaptiveMarketPlanning