introduce stochastic gradient

# Adaptive Market Planning ## Introduction Adaptive market planning addresses problems involving resource allocation under uncertain demand. A classic example is stocking perishable inventory, like fresh fish, where unsold stock cannot be carried over. ## The Newsvendor Problem The newsvendor problem, a cornerstone in optimization under uncertainty, is formulated as: $$ \max_{x} \mathbb{E}F(x,W) = \mathbb{E}\left[p\min\{x,W\} - cx \right] $$ where: * $x$: Decision variable (order quantity) * $W$: Uncertain demand * $p$: Selling price * $c$: Unit cost ($p>c$) When demand $W$ is known, the optimal solution is simply $x=W$. However, under uncertainty, where $W$ is a random variable with distribution $f^W(w)$, finding the optimal order quantity becomes more complex. ## Modeling Adaptive Market Planning **Objective:** Maximize net income from resource allocation under uncertain demand. ### Mathematical Model * **State Variables:** $S_n = x_n$ (order quantity) * **Decision Variables:** $\alpha_n$ (step size) * **Exogenous Information:** $W_{n+1}$ (random demand) * **Transition Function:** Updates order quantity using stochastic gradient and projection. $$ x^{n+1} = \max\{0, \lfloor x^{n} + \alpha_{n} \nabla_x F(x^{n}, W^{n+1})\rfloor\} $$ * **Objective Function:** Calculates net profit at each iteration: $$F(x_{n}, W_{n+1}) = p\min\{x^{n}, W_{n+1}\} - cx_{n}$$ ### Uncertainty Modeling To model demand uncertainty in the newsvendor problem, we assume that the demand $W$ follows a Poisson distribution with unknown mean $\mu$: $$ f^{W}(w) = \frac{\mu^{w}e^{-\mu}}{w!},\quad w = 0,1,2,\dots $$ While we assume a specific distribution for $W$, the mean demand $\mu$ is unknown but follows a known discrete distribution: $$ p_{k}^{\mu} = \text{Prob}[\mu = \mu_{k}], \quad k=1,2,\dots, K $$ This distribution $p^{\mu} = (p^{\mu}_{k})_{k = 1}^{K}$ is incorporated into our initial state $S^0$, reflecting the prior knowledge we have about the possible values of the mean demand. Here we set $K = 1$ and $\mu_{k} = 100$. ### Implementation ```python= class AdaptiveMarketPlanningModel(): """ Base class for model """ def __init__(self, state_names, x_names, s_0, T, reward_type, price=1.0, cost=1.0, exog_info_fn=None, transition_fn=None, objective_fn=None, seed=20180613): self.init_args = {seed: seed} self.prng = np.random.RandomState(seed) self.init_state = s_0 self.T = T self.reward_type = reward_type self.state_names = state_names self.x_names = x_names self.State = namedtuple('State', state_names) self.state = self.build_state(s_0) self.Decision = namedtuple('Decision', x_names) self.obj = [] if reward_type == 'Cumulative' else 0.0 self.past_derivative = 0.0 self.cost = cost self.price = price self.t = 0 self.learning_list = [] # this function gives a state containing all the state information needed def build_state(self, info): return self.State(*[info[k] for k in self.state_names]) # this function gives a decision def build_decision(self, info): return self.Decision(*[info[k] for k in self.x_names]) def exog_info_fn(self, decision): # return new demand based on a given distribution return {"demand": self.prng.poisson(100)} # this function takes in the decision and exogenous information to return # this function gives the exogenous information that is dependent on a random process # computes the f_hat, chnage in the forecast over the horizon # new state def transition_fn(self, decision, exog_info): self.learning_list.append(self.state.order_quantity) # compute derivative derivative = self.price - self.cost if self.state.order_quantity < exog_info['demand'] else -self.cost # update order quantity new_order_quantity = max(0, int(self.state.order_quantity + decision.step_size * derivative)) # count number of times derivative changes sign new_counter = self.state.counter + 1 if self.past_derivative * derivative < 0 else self.state.counter self.past_derivative = derivative return {"order_quantity": new_order_quantity, "counter": new_counter} # this function calculates how much money we make def objective_fn(self, decision, exog_info): self.order_quantity = self.state.order_quantity obj_part = self.price * min(self.order_quantity, exog_info['demand']) - self.cost * self.state.order_quantity return obj_part # this method steps the process forward by one time increment by updating the sum of the contributions, the # exogenous information and the state variable def step(self, decision): self.t_update() exog_info = self.exog_info_fn(decision) onestep_contribution = self.objective_fn(decision, exog_info) #Check if cumulative or terminal reward if (self.reward_type == 'Cumulative'): self.obj.append(onestep_contribution) else: if (self.t == self.T): self.obj = onestep_contribution transition_info = self.transition_fn(decision, exog_info) self.state = self.build_state(transition_info) # Update method for time counter def t_update(self): self.t += 1 return self.t ``` ## Stochastic Gradient Algorithm In the newsvendor problem, when the demand distribution $W$ is known, the optimal order quantity $x^*$ can be determined using the cumulative distribution function (CDF) of the demand: $$ F^{W}(x^{\ast}) = \frac{p-c}{p}. $$ where * $F^W(w)$ is the CDF of the demand * $p$ is the selling price * $c$ is the unit cost However, in real-world scenarios, the true demand distribution is often unknown, making it impossible to directly apply this formula. To address this uncertainty, we utilize the stochastic gradient algorithm. This iterative method updates the order quantity $x^n$ at each time step $n$ based on the observed demand $W^{n+1}$ and a step size $\alpha_n$: $$ x^{n+1} = \max\{0, \lfloor x^{n} + \alpha_{n} \nabla_x F(x^{n}, W^{n+1}) \rfloor \} $$ where * $\nabla_x F(x^{n}, W^{n+1})$ ) is the gradient of the profit function with respect to the order quantity * The $\max$ and $\lfloor \rfloor$ operators ensure that the order quantity remains non-negative and integer ### Conditions for Convergence For the algorithm to converge to the optimal order quantity, the step sizes $\alpha_{n}$ must satisfy the following conditions: * $\alpha_{n} > 0$ (positive step sizes) * $\sum_{n = 1}^{\infty}\alpha_{n} = \infty$ (step sizes don't decrease too quickly) * $\sum_{n = 1}^{\infty}(\alpha_{n})^{2} < \infty$ (step sizes decrease sufficiently) ### Handling Constraints Order quantities may be subject to constraints, such as being non-negative or having a limited inventory capacity. These constraints are incorporated directly into the update equation by using the projection operator: * The $\max\{0, \dots\}$ ensures non-negativity * The $\lfloor \dots \rfloor$ ensures the order quantity is an integer ![螢幕擷取畫面 2024-07-14 033433](https://hackmd.io/_uploads/H1dwDLxdR.png) ### Designing Step Size Policies The step size $\alpha_n$ significantly impacts the algorithm's convergence speed and accuracy. Two common step size policies are: #### Harmonic Stepsize Rule This deterministic policy predefines the step size for each iteration: $$ \alpha^{harmonic}(S^{n}|\theta^{step}) = \frac{\theta^{step}}{\theta^{step} + n - 1} $$ The parameter $\theta^{step}$ is a parameter controlling the initial step size and the rate of decrease. #### Kesten's Rule This adaptive policy adjusts the step size based on the gradient's behavior: $$ \alpha^{kesten}(S^{n}|\theta^{step}) = \frac{\theta^{step}}{\theta^{step} + K^n- 1} $$ where $K^n$ counts how many times the gradient changes direction: $$ K^{n+1} = \begin{cases} K^{n} + 1 & \text{if } (\nabla^{x} F(x^{n}, W^{n+1}))^{\top}\nabla^{x} F(x^{n-1}, W^{n}) < 0\\ K^{n} & \text{otherwise} \end{cases} $$ A high count of $K^n$ indicates oscillation around the optimum, prompting a smaller step size to refine the search. To implement Kesten's rule, we track the sign changes of the gradient and update the counter $K^n$ accordingly. This requires augmenting the state variable to include $K^n$: $$ S^{n} = (x^{n}, K^{n}) $$ ### Implementation ```python= class AdaptiveMarketPlanningPolicy(): """ Base class for policy """ def __init__(self, AdaptiveMarketPlanningModel, theta_step): """ Initializes the model :param AdaptiveMarketPlanningModel: AdaptiveMarketPlanningModel - model to construct decision for :param theta_step: float - theta step variable """ self.M = AdaptiveMarketPlanningModel self.theta_step = theta_step # returns decision based on harmonic step size policy def harmonic_rule(self): return self.M.build_decision({'step_size': self.theta_step / (self.theta_step + self.M.t - 1)}) # returns decision based on Kesten's rule policy def kesten_rule(self): return self.M.build_decision({'step_size': self.theta_step / (self.theta_step + self.M.state.counter - 1)}) # returns decision based on a constant rule policy def constant_rule(self): return self.M.build_decision({'step_size': self.theta_step}) # returns decision based on a constant rule policy def run_policy(self): model_copy = deepcopy(self.M) for t in range(model_copy.T): model_copy.step(AdaptiveMarketPlanningPolicy(model_copy, self.theta_step).kesten_rule()) return (model_copy.obj, model_copy.learning_list) ``` ### Experimental Evaluation #### Scenario Setup We evaluate the performance of the stochastic gradient algorithm with Kesten's rule step size policy in the context of the newsvendor problem. The specific scenario parameters are as follows: * Price $p$: 26 * Cost $c$: 20 * Trial size $K$: 1000 * Step $T$: 800 * Step Size Policy: Kesten's Rule * Evaluation Metric: Cumulative Reward * Policy Parameter Set $\theta^{step}$: $\{10, 20, \dots, 100\}$ The objective is to identify the optimal value of the policy parameter $\theta^{step}$ that maximizes the cumulative reward, representing the total profit over the simulation period. ### Optimization Procedure To determine the best policy, we conduct multiple simulations for each value of $\theta^{step}$ in the policy set. Within each simulation, we run the stochastic gradient algorithm for the specified number of time steps ($T = 800$). The order quantity is updated at each step based on the observed demand and Kesten's rule. The cumulative reward for each simulation is calculated as the sum of profits over all time steps. The average cumulative reward is then computed across all trials ($K = 1000$) for each value of $\theta^{step}$. $$ \max_{\pi}\frac{1}{K}\sum_{k = 1}^{K}\sum_{t = 0}^{T-1}F(X^{\pi}(S_{t}), W_{k}^{t + 1}). $$ The optimal policy is identified as the one that yields the highest average cumulative reward. #### Implementation in Python ```python= # Adaptive Market Planning Driver Script state_names = ['order_quantity', 'counter'] init_state = {'order_quantity': 0, 'counter': 0} decision_names = ['step_size'] cost = 20 trial_size = 1000 price = 26 theta_step = np.linspace(10,100,10) T = 8000 reward_type = 'Cumulative' reward_per_policy = [] fig1, pic_first= plt.subplots() fig2, pic_last =t= plt.subplots() for i in range(len(theta_step)): M = AdaptiveMarketPlanningModel(state_names, decision_names, init_state, T,reward_type, price, cost) P = AdaptiveMarketPlanningPolicy(M, theta_step[i]) learning_list_per_iteration = np.zeros(T) rewards_per_iteration = [] for ite in list(range(trial_size)): reward,learning_list = P.run_policy(ite) rewards_per_iteration.append(reward) reward = np.array(rewards_per_iteration).sum(axis = 0)/trial_size pic_first.plot(range(100), reward[0:100], label = theta_step[i]) pic_last.plot(range(T-100, T), reward[T-100:T], label = theta_step[i]) reward_per_policy.append(reward.sum()) print("the best policy is", theta_step[np.argmax(reward_per_policy)]) pic_first.legend() pic_first.set_xlabel('Iteration') pic_first.set_ylabel('Objective function') pic_last.legend() pic_last.set_xlabel('Iteration') pic_last.set_ylabel('Objective function') ``` ``` The best policy is 100 ``` ![image](https://hackmd.io/_uploads/Bk1QrF35C.png) ![image](https://hackmd.io/_uploads/SJs_HthqA.png) ## Parametric Model for Adaptive Market Planning We can extend the adaptive market planning model by introducing a parametric representation of the order quantity. This allows us to model the relationship between order quantity and price more flexibly. We define the order quantity $x^{\pi,N}$ as a parametric function of the price $p$ and a parameter vector $\theta$: $$ x^{\pi,N}(p|\theta) = \theta_{0} + \theta_{1}p + \theta_{2}p^{\theta_{3}}. $$ In this model, the state variable is augmented to include the parameter vector $\theta$ along with the gradient change counter $K^n$ and the current price $p_n$: $$ S_{n} = \{K_{n}, p_{n}, \theta \}, $$ where $\theta = (\theta_0, \theta_1, \theta_2, \theta_3)$ The price process $p_n$ is modeled as a random walk with reflecting boundaries: $$ p_{n+1} =\min\{p^{\text{high}},\max\{p^{\text{low}}, p_{n} + \delta \}\}, $$ where: * $\delta$ is a random variable with $\text{Prob}(\delta=1)=0.2$, $\text{Prob}(\delta=−1)=0.2$, and $\text{Prob}(\delta=0)=0.6$. * $p^{\text{low}}$ and $p^{\text{high}}$ are deterministic lower and upper bounds on the price. The transition function for the parameter vector $\theta$ is defined as: $$ \theta_{n+1} = \theta_{n} + \alpha_{n}\nabla^{x} F(x^{n}, p_{n+1}), $$ ### Implementation in Python ```python= class ParametricModel(AdaptiveMarketPlanningModel): """ Subclass for Adaptive Market Planning """ def __init__(self, state_names, x_names, s_0, T, reward_type, cost = 1.0, price_low=1.0, price_high=10.0, exog_info_fn=None, transition_fn=None, objective_fn=None, seed=20180613): """ Initializes the model See Adaptive Market Planning Model for more details """ super().__init__(state_names, x_names, s_0, T, reward_type, cost=cost, exog_info_fn=exog_info_fn, transition_fn=transition_fn, objective_fn=objective_fn, seed=seed) self.past_derivative = np.array([0, 0, 0, 0]) self.low = price_low self.high = price_high # returns order quantity for a given price and theta vector def order_quantity_fn(self, price, theta): price = float(price) return max(0, int(theta[0] + theta[1] * price + theta[2] * (price ** (theta[3])))) # returns derivative for a given price and theta vector def derivative_fn(self, price, theta): price = float(price) return np.array([1, price, price ** (theta[3]), theta[2] * (price ** (theta[3])) * math.log(price)]) # this function takes in the decision and exogenous information to return # new state def transition_fn(self, decision, exog_info): self.learning_list.append(self.state.theta) # Compute derivative and update theta derivative = np.array([0, 0, 0, 0]) if self.order_quantity_fn(self.state.price, self.state.theta) < exog_info['demand']: derivative = (self.state.price - self.cost) * self.derivative_fn(self.state.price, self.state.theta) else: derivative = (-self.cost) * self.derivative_fn(self.state.price, self.state.theta) new_theta = self.state.theta + decision.step_size * derivative new_counter = self.state.counter + 1 if np.dot(self.past_derivative, derivative) < 0 else self.state.counter self.past_derivative = derivative # Generate random price change delta = self.prng.choice([-1, 0, 1], p=[0.2, 0.6, 0.2]) # Simplified random walk logic new_price = min(self.high, max(self.low, self.state.price + delta)) return {"counter": new_counter, "price": new_price, "theta": new_theta} # this function calculates how much money we make def objective_fn(self, decision, exog_info): self.price = self.state.price self.order_quantity = self.order_quantity_fn(self.state.price, self.state.theta) # Assign to self.order_quantity # Correctly refer to self.order_quantity obj_part = self.state.price * min(self.order_quantity, exog_info['demand']) - self.cost * self.order_quantity return obj_part ``` ## Experimental Evaluation We evaluate the parametric model under the following conditions: * $p^{\text{low}}$: 20 * $p^{\text{high}}$: 45 * Trial Size: 1000 * Time Steps: 8000 * Policy: Kesten's Rule * Evaluation: Cumulative Reward * Policy Parameter Set $\theta^{step}$: $\{2, 3, \dots, 10\}$ ```python=+ # Define state variables state_names = ['counter', 'price', 'theta'] init_state = {'counter': 0, 'price': 26, 'theta': np.array([0.1, 0.1, 0.1, -2])} decision_names = ['step_size'] # Read variables from Excel file cost = 20 trial_size = 1000 price_low = 20 price_high = 45 theta_step = np.linspace(2,10,9) T = 8000 reward_type = 'Cumulative' reward_per_policy = [] # Initialize model and run simulations fig1, pic_first= plt.subplots() fig2, pic_last =t= plt.subplots() for i in range(len(theta_step)): M = ParametricModel(state_names, decision_names, init_state, T, reward_type, cost, price_low = price_low, price_high = price_high) P = AdaptiveMarketPlanningPolicy(M, theta_step[i]) learning_list_per_iteration = np.zeros(T) rewards_per_iteration = [] for ite in list(range(trial_size)): reward,learning_list = P.run_policy(ite) rewards_per_iteration.append(reward) reward = np.array(rewards_per_iteration).sum(axis = 0)/trial_size pic_first.plot(range(T), reward[0:T], label = theta_step[i]) pic_last.plot(range(T-100, T), reward[T-100:T], label = theta_step[i]) reward_per_policy.append(reward.sum()) print("the best policy is", theta_step[np.argmax(reward_per_policy)]) pic_first.legend() pic_first.set_xlabel('Iteration') pic_first.set_ylabel('Objective function') pic_last.legend() pic_last.set_xlabel('Iteration') pic_last.set_ylabel('Objective function') ``` ``` The best policy is 2 ``` ![image](https://hackmd.io/_uploads/HySdWd2c0.png) ![image](https://hackmd.io/_uploads/rJVo-uhcC.png) ## Reference * Chap 3 of Warren B. Powell (2022), Sequential Decision Analytics and Modeling: Modeling with Python [link](https://castle.princeton.edu/sdamodeling/) * https://github.com/djanka2/stochastic-optimization/tree/master/AdaptiveMarketPlanning