# An Asset Selling Problem ## Introduction This note addresses the sequential decision problem of optimizing the selling time for a single asset in a fluctuating market. We assume a price-taker seller and aim to develop a profit-maximizing policy. The study explores various selling strategies, their performance under different market conditions, and extensions to incorporate time-series price dynamics and multiple assets. This work contributes to sequential decision-making under uncertainty, offering insights for effective asset management and trading. ## Basic Model ### State Variables The model utilizes two state variables to characterize the system at any given time: * **Physical State $R_t$:** A binary variable indicating whether the asset is currently held (1) or not held (0): $$R_t = \begin{cases} 1 & \text{if we are holding the stock at time} t \\ 0 & \text{if we are not longer holding the stock at time } t \end{cases}$$ * **Information State $p_t$:** Represents the current market price of the asset. The complete state of the system at time $t$ is represented by the tuple: $$ S_t = (R_t, p_t) $$ ### Decision Variables The decision variable $x_t$ representing the action to be taken at time $t$: * $x_t = 1$: Sell the asset * $x_t = 0$: Hold the asset The constraint $x_t \leq R_t$ ensures the asset can only be sold if it's currently held. A policy guides the decision-making process, mapping the current state $S_t$ to a decision $x_t$. ### Exogenous Information Exogenous information $W_t$ captures the change in asset price from time $t$ to $t+1$: $$ \hat{p}_{t+1} = p_{t+1} - p_t $$ ### Transition Function The transition function describes how the state evolves over time: \begin{split} R_{t+1} &= R_t - x_t \\ p_{t+1} &= p_t + \hat{p}_{t+1} \end{split} * The first equation updates the asset ownership status based on the sell decision. * The second equation updates the asset price based on the exogenous price change. ### Objective Function The objective is to maximize the expected total earnings from selling the asset: \begin{split} \max_{x_0, x_1, \dots, x_{T-1}} & \quad \mathbb{E} \left[ \sum_{t = 0}^{T-1} p_t x_t \right] \\ \text{subject to}& \quad \sum_{t = 0}^{T-1} x_t = 1, \quad x_t \in \{0, 1\} \end{split} The constraint enforces selling the asset exactly once within the given time horizon. ### Modeling Uncertainty Price changes are assumed to follow a normal distribution. The mean of this distribution reflects the market trend (upward, neutral, or downward). * A `bias` variable within the state indicates the current market trend. * A `bias_df` DataFrame stores probabilities for how the bias might change at each time step. ```python! bias_df = [ "Up": [0.9, 0.1, 0], "Neutral": [0.2, 0.6, 0.2], "Down": [0, 0.1, 0.9]] ``` ### Code Implementation * The `SDPModel` class provides a general framework for sequential decision problems. * The `AssetSellingModel` class extends `SDPModel` to specifically model the asset selling problem, defining state variables, decision variables, transition dynamics, and the objective function. * The `AssetSellingModelHistorical` class further extends the model to incorporate historical price data. #### Class `SDPModel` ```python= from collections import namedtuple import numpy as np from abc import ABC, abstractmethod from typing import List, Dict, Any, Optional class SDPModel(ABC): """ Sequential Decision Problem base class This class represents a base class for sequential decision problems. It provides methods for initializing the problem, resetting the state, performing a single step in the problem, and updating the time index. """ def __init__( self, state_names: List[str], decision_names: List[str], S0: Dict[str, Any], t0: float = 0, T: float = 1, seed: int = 42, ) -> None: """Initialize an instance of the SDPModel class.""" self.State = namedtuple("State", state_names) self.Decision = namedtuple("Decision", decision_names) self.state_names = state_names self.decision_names = decision_names self.initial_state = self.build_state(S0) self.state = self.initial_state self.objective = 0.0 self.t0 = t0 self.t = t0 self.T = T self.seed = seed self.prng = np.random.RandomState(seed) self.episode_counter = 0 def reset(self, reset_prng: bool = False) -> None: """Reset the SDPModel to its initial state.""" self.state = self.initial_state self.objective = 0.0 self.t = self.t0 if reset_prng: self.prng = np.random.RandomState(self.seed) def build_state(self, info: Dict[str, Any]) -> State: """Set the new state values using the provided information.""" return self.State(**{k: info[k] for k in self.state_names}) def build_decision(self, info: Dict[str, Any]) -> Decision: """Build a decision object using the provided information.""" return self.Decision(**{k: info[k] for k in self.decision_names}) @abstractmethod def exog_info_fn(self, decision: Decision) -> Dict[str, Any]: """Generate exogenous information.""" pass @abstractmethod def transition_fn(self, decision: Decision, exog_info: Dict[str, Any]) -> Dict[str, Any]: """Compute the state transition.""" pass @abstractmethod def objective_fn(self, decision: Decision, exog_info: Dict[str, Any]) -> float: """Compute the objective value contribution.""" pass def is_finished(self) -> bool: """Check if the model run is finished.""" return self.t >= self.T def update_t(self) -> float: """Update the value of the time index t.""" self.t += 1 return self.t def step(self, decision: Decision) -> State: """Perform a single step in the sequential decision problem.""" exog_info = self.exog_info_fn(decision) self.objective += self.objective_fn(decision, exog_info) exog_info.update(self.transition_fn(decision, exog_info)) self.state = self.build_state(exog_info) self.update_t() return self.state ``` #### Class `AssetSellingModel(SDPModel)` ```python= import pandas as pd import numpy as np from typing import Dict, Optional from sdp_model import SDPModel # Assuming SDPModel is in a file named sdp_model.py class AssetSellingModel(SDPModel): def __init__( self, S0: Dict[str, float], t0: float = 0, T: float = 1, seed: int = 42, alpha: float = 0.7, var: float = 2, bias_df: Optional[pd.DataFrame] = None, upstep: float = 1, downstep: float = -1, ) -> None: state_names = ["price", "bias", "price_smoothed", "resource"] decision_names = ["sell"] S0 = { "price": S0.get("price", 0.0), "bias": S0.get("bias", "Neutral"), "price_smoothed": S0.get("price_smoothed", S0.get("price", 0.0)), "resource": S0.get("resource", 1), } super().__init__(state_names, decision_names, S0, t0, T, seed) self.alpha = alpha self.var = var self.upstep = upstep self.downstep = downstep self.bias_df = bias_df if bias_df is not None else pd.DataFrame({ "Up": [0.9, 0.1, 0], "Neutral": [0.2, 0.6, 0.2], "Down": [0, 0.1, 0.9] }, index=["Up", "Neutral", "Down"]) def is_finished(self) -> bool: """Check if the model run (episode) is finished.""" return super().is_finished() or self.state.resource == 0 def exog_info_fn(self, decision) -> Dict[str, float]: """Generates exogenous information for the asset selling model.""" biasprob = self.bias_df[self.state.bias] coin = self.prng.uniform() if coin < biasprob["Up"]: new_bias, bias = "Up", self.upstep elif coin < biasprob["Up"] + biasprob["Neutral"]: new_bias, bias = "Neutral", 0 else: new_bias, bias = "Down", self.downstep price_delta = self.prng.normal(bias, self.var) new_price = max(0.0, self.state.price + price_delta) return { "price": new_price, "bias": new_bias, } def transition_fn(self, decision, exog_info: Dict[str, float]) -> Dict[str, float]: """Computes the state transition.""" new_resource = 0 if decision.sell == 1 else self.state.resource new_price_smoothed = (1 - self.alpha) * self.state.price_smoothed + self.alpha * exog_info["price"] return { "resource": new_resource, "price_smoothed": new_price_smoothed } def objective_fn(self, decision, exog_info: Dict[str, float]) -> float: """Computes the objective value.""" return self.state.price * (decision.sell == 1 and self.state.resource != 0) class AssetSellingModelHistorical(AssetSellingModel): def __init__( self, hist_data: pd.DataFrame, alpha: float = 0.7, ) -> None: super().__init__(S0={"price": 0.0}, alpha=alpha) self.hist_data = hist_data self.episode_data = [] def reset(self, reset_prng: bool = False) -> None: """Resets the model and prepares data for the next episode.""" self.episode_data = self.hist_data.loc[self.hist_data["N"] == self.episode_counter, "price"].tolist() self.episode_data.pop(0) # Remove the first element self.T = len(self.episode_data) super().reset(reset_prng) self.episode_counter += 1 def exog_info_fn(self, decision) -> Dict[str, float]: """Returns the next historical price.""" return {"price": self.episode_data.pop(0), "bias": "Neutral"} ``` ## Designing Policies In this context, a policy is a decision-making rule that dictates whether to sell the asset at any given time, based on the system's current state. We present three illustrative policies: ### Sell-Low Policy This policy triggers a sell action when the current asset price $p_t$ drops below a predetermined lower threshold $\theta^{low}$. Additionally, it enforces a sell action at the end of the time horizon $t = T$ if the asset is still held $R_t = 1$. $$ X_{t}^{sell-low}(S_t | \theta^{low}) = \begin{cases} \,1 & \text{if } p_t < \theta^{low} \text{ and } R_t = 1 \\ \,1 & \text{if } t = T \text{ and } R_t = 1 \\ \,0 & \text{Otherwise} \end{cases} $$ ### High-Low Selling Policy This policy extends the sell-low policy by incorporating an upper threshold $\theta^{high}$. A sell action is triggered if the price falls below the lower threshold or exceeds the upper threshold. It also includes the mandatory sale at the end of the time horizon if the asset remains unsold. $$ X_{t}^{high-low}(S_t | \theta^{high}, \theta^{low}) = \begin{cases} \,1 & \text{if } p_t < \theta^{low} \text{ or } p_t > \theta^{high} \\ \,1 & \text{If } t = T \text{ and } R_t = 1 \\ \,0 & \text{Otherwise} \end{cases} $$ ### Tracking Policy This policy compares the current asset price $p_t$ with a smoothed price estimate $\bar{p}_t$, often calculated using an exponential moving average. If the actual price deviates from the smoothed estimate by a specified threshold $\theta^{track}$, a sell action is triggered. It also mandates selling at the end of the time horizon if the asset is still held. The smoothed price is calculated as: $$ \bar{p}_t = (1 - \alpha)\bar{p}_{t-1} + \alpha p_t $$ where $\alpha$ is a smoothing factor. The policy is defined as: $$ X_{t}^{track}(S_t | \theta^{track}) = \begin{cases} \,1 & \text{if } p_t \geq \bar{p}_t + \theta^{track} \\ \,1 & \text{if } t = T \text{ and } R_t = 1 \\ \,0 & \text{Otherwise} \end{cases} $$ ### Code Implementation #### Class `SDPPolicy` The `SDPPolicy` class serves as an abstract base class for implementing and evaluating various policy strategies within the context of sequential decision problems (SDPs), enabling interaction with a specified SDP model. ```python= from copy import deepcopy from abc import ABC, abstractmethod import pandas as pd from typing import Dict, Any, Optional from sdp_model import SDPModel # Assuming SDPModel is in a file named sdp_model.py class SDPPolicy(ABC): def __init__(self, model: SDPModel, policy_name: str = ""): self.model = model self.policy_name = policy_name self.results: pd.DataFrame = pd.DataFrame() self.performance: Optional[pd.Series] = None @abstractmethod def get_decision(self, state: Any, t: float, T: float) -> Dict[str, Any]: """ Returns the decision made by the policy based on the given state. Args: state (Any): The current state of the system. t (float): The current time step. T (float): The end of the time horizon / total number of time steps. Returns: Dict[str, Any]: The decision made by the policy. """ pass def run_policy(self, n_iterations: int = 1) -> float: """ Runs the policy over the time horizon [0,T] for a specified number of iterations and returns the mean performance. Args: n_iterations (int): The number of iterations to run the policy. Default is 1. Returns: float: The mean performance across all iterations. """ result_list = [] for i in range(n_iterations): model_copy = deepcopy(self.model) model_copy.episode_counter = i model_copy.reset(reset_prng=False) while not model_copy.is_finished(): state_t = model_copy.state decision_t = model_copy.build_decision(self.get_decision(state_t, model_copy.t, model_copy.T)) results_dict = { "N": i, "t": model_copy.t, "C_t sum": model_copy.objective, **state_t._asdict(), **decision_t._asdict() } result_list.append(results_dict) model_copy.step(decision_t) # Log final state result_list.append({ "N": i, "t": model_copy.t, "C_t sum": model_copy.objective, **model_copy.state._asdict() }) self.results = pd.DataFrame(result_list) self.results["t_end"] = self.results.groupby("N")["t"].transform("max") self.performance = self.results.loc[self.results["t"] == self.results["t_end"], ["N", "C_t sum"]] self.performance = self.performance.set_index("N") self.results["C_t"] = self.results.groupby("N")["C_t sum"].diff().shift(-1) nan_count = self.results["C_t sum"].isna().sum() if nan_count > 0: print(f"Warning! For {nan_count} iterations the performance was NaN.") return self.performance["C_t sum"].mean() ``` #### Concrete Policy Classes (`SellLowPolicy`, `HighLowPolicy`, `TrackPolicy`) These classes inherit from `SDPPolicy` and provide concrete implementations of the `get_decision` method, embodying their specific decision-making logic. ```python= from typing import Dict, Any from sdp_model import SDPModel from sdp_policy import SDPPolicy class SellLowPolicy(SDPPolicy): def __init__(self, model: SDPModel, policy_name: str = "SellLow", theta_low: float = 10): super().__init__(model, policy_name) self.theta_low = theta_low def get_decision(self, state: Any, t: float, T: float) -> Dict[str, int]: return {"sell": 1} if t == T - 1 or state.price < self.theta_low else {"sell": 0} class HighLowPolicy(SDPPolicy): def __init__(self, model: SDPModel, policy_name: str = "HighLow", theta_low: float = 10, theta_high: float = 30): super().__init__(model, policy_name) self.theta_low = theta_low self.theta_high = theta_high def get_decision(self, state: Any, t: float, T: float) -> Dict[str, int]: return {"sell": 1} if t == T - 1 or state.price < self.theta_low or state.price > self.theta_high else {"sell": 0} class TrackPolicy(SDPPolicy): def __init__(self, model: SDPModel, policy_name: str = "Track", theta: float = 10): super().__init__(model, policy_name) self.theta = theta def get_decision(self, state: Any, t: float, T: float) -> Dict[str, int]: return {"sell": 1} if (t == T - 1 or abs(state.price - state.price_smoothed) >= self.theta) else {"sell": 0} ``` ## Policy Evaluation To assess the effectiveness of different selling policies, we simulate their application across numerous potential price trajectories (sample paths). For a given sample path $\omega$, the performance of a policy $\pi$ is evaluated by calculating the total earnings achieved: $$ \hat{F}^\pi(\omega) = \sum_{t = 0}^{T - 1}p_t(\omega)X^\pi(S_t(\omega)) $$ This equation sums the product of the asset price at each time step $p_t(\omega)$ and the policy's decision to sell or hold $X^\pi(S_t(\omega))$ along the sample path. To obtain a more robust estimate of a policy's performance, we average its performance over multiple sample paths $(\omega^1, \omega^2, ..., \omega^N)$: $$ \bar{F}^\pi(\omega) = \frac1N \sum_{n=1}^N \hat{F}^\pi(\omega^n) $$ This average performance represents the expected earnings under policy $\pi$. Additionally, we quantify the variability in performance by calculating the variance: $$ (\bar{\sigma}^\pi)^2 = \frac{1}{N} (\hat{\sigma}^\pi)^2 $$ where $\hat{\sigma}^\pi$ is the estimated standard deviation of the performance across sample paths: $$ (\hat{\sigma}^\pi)^2 = \frac{1}{N-1} \sum_{n=1}^N (\hat{F}^\pi(\omega^n) - \bar{F}^\pi)^2 $$ ### Experiments We conducted a series of experiments to evaluate and compare the performance of the three proposed policies: Sell Low, High Low, and Track. For each policy, we systematically searched for the optimal parameter values that would yield the best possible average performance. #### Parameter Optimization To identify the ideal parameter values for each policy, we utilized a combination of grid search and linear search techniques. * **Grid Search:** * This method systematically explores all possible combinations of parameter values within a predefined grid. * The policy's performance is evaluated through multiple simulation runs for each parameter combination. * The combination resulting in the highest average performance is selected as the optimal one. * **Linear Search:** * This approach is particularly well-suited for policies that have a single parameter to tune. * It involves sequentially testing a range of values for the parameter. ```python= import pandas as pd import numpy as np from itertools import product from copy import deepcopy from typing import Dict, List, Any from sdp_policy import SDPPolicy # Assuming SDPPolicy is in a file named sdp_policy.py def linspace_search(parameter_values: Dict[str, List[float]], policy: SDPPolicy, n_iterations: int) -> Dict[str, Any]: """ Performs a linear search over a single parameter specified as a dictionary. Args: parameter_values (Dict[str, List[float]]): A dictionary where keys are parameter names and values are lists of values to test. policy (SDPPolicy): An instance of the SDPPolicy class representing the policy to evaluate. n_iterations (int): Number of iterations to run for each parameter value. Returns: Dict[str, Any]: A dictionary containing the best parameter, best performance, and all runs. """ best_performance = float('-inf') best_value = None all_runs = [] for parameter_name, values_to_test in parameter_values.items(): for value in values_to_test: policy_copy = deepcopy(policy) setattr(policy_copy, parameter_name, value) performance = policy_copy.run_policy(n_iterations=n_iterations) all_runs.append({parameter_name: value, 'performance': performance}) if performance > best_performance: best_performance = performance best_value = value return { "best_parameter": best_value, "best_performance": best_performance, "all_runs": pd.DataFrame(all_runs) } def grid_search(grid: Dict[str, List[float]], policy: SDPPolicy, n_iterations: int, ordered: bool = False) -> Dict[str, Any]: """ Performs a grid search over multiple parameters specified as a dictionary. Args: grid (Dict[str, List[float]]): A dictionary where keys are parameter names and values are lists of values to test. policy (SDPPolicy): An instance of the SDPPolicy class representing the policy to evaluate. n_iterations (int): Number of iterations to run for each parameter combination. ordered (bool): If True, ensures the first parameter is always less than or equal to the second (for two-parameter searches only). Returns: Dict[str, Any]: A dictionary containing the best parameters, best performance, and all runs. """ if len(grid) != 2 and ordered: ordered = False print("Warning: Grid search for ordered parameters only works if there are exactly two parameters.") best_performance = float('-inf') best_parameters = None rows = [] params = list(grid.keys()) for v in product(*grid.values()): if ordered and len(v) == 2 and v[0] >= v[1]: continue policy_copy = deepcopy(policy) for param, value in zip(params, v): setattr(policy_copy, param, value) performance = policy_copy.run_policy(n_iterations=n_iterations) row = dict(zip(params, v)) row["performance"] = performance rows.append(row) if performance > best_performance: best_performance = performance best_parameters = dict(zip(params, v)) return { "best_parameters": best_parameters, "best_performance": best_performance, "all_runs": pd.DataFrame(rows), } ``` #### Sell Low Policy ```python= # Initialize the model model = asm.AssetSellingModel(S0={"price": 20}, T=30) # Experiment 1: Sell Low Policy sell_low_policy = asp.SellLowPolicy(model=model) parameter_values = {"theta_low": np.linspace(0, 20, 21)} sell_low_result = linspace_search(parameter_values, sell_low_policy, n_iterations=10000) print(f"Sell Low Policy - Best parameter: theta_low = {sell_low_result['best_value']} with an objective of {round(sell_low_result['best_performance'], 3)}.") fig_sell_low = px.line(sell_low_result['all_runs'], x='theta_low', y='performance', title='Performance of the Sell Low Policy') fig_sell_low.show() ``` ``` Sell Low Policy - Best parameter: theta_low = 14.0 with an objective of 23.156. ``` ![newplot](https://hackmd.io/_uploads/HyB8yHy5R.png) * **Result:** The optimal theta_low value was found to be 14.0, leading to an average objective value of 23.156. * **Interpretation:** The line plot visualizes the performance of the Sell Low policy across different theta_low values. #### High Low Policy ```python= # Experiment 2: High Low Policy high_low_policy = asp.HighLowPolicy(model=model) grid = {"theta_low": np.linspace(10, 20, 11), "theta_high": np.linspace(20, 40, 21)} high_low_result = FBP.grid_search(grid, high_low_policy, n_iterations=1000, ordered=True) print(f"High Low Policy - Best parameters: {high_low_result['best_parameters']} with an objective of {round(high_low_result['best_performance'], 3)}.") res_grid = high_low_result["all_runs"].pivot(index="theta_low", columns="theta_high", values="performance") fig_high_low = px.imshow(res_grid, title='Performance of the High Low Policy', labels=dict(x="theta_high", y="theta_low", color="Performance")) fig_high_low.show() ``` ``` High Low Policy - Best parameters: {'theta_low': 14.0, 'theta_high': 40.0} with an objective of 22.234. ``` ![newplot](https://hackmd.io/_uploads/r1uSWq1cR.png) * **Result:** The optimal parameter combination was `theta_low = 14.0` and `theta_high = 40.0`, resulting in an average objective value of 22.234. * **Interpretation:** The heatmap illustrates the performance of the High Low policy across various combinations of `theta_low` and `theta_high`. The best performance is observed at the upper boundary of the tested `theta_high` values, suggesting that further exploration of higher values might lead to even better results. This aligns with the intuition that the Sell Low policy, which can be viewed as a High Low policy with an infinitely high `theta_high`, tends to outperform the High Low policy within the tested parameter range. #### Track Policy ```python= # Experiment 3: Track Policy track_result = [] best_alpha = None best_theta = None best_performance = -np.inf # All the trajectories begin with initial price = 20 and the terminal time T = 30 for i in np.linspace(0.1, 0.9, 9): model = asm.AssetSellingModel(S0={"price": 20}, T=30, alpha=i) for j in range(1, 21): track_policy = asp.TrackPolicy(model=model, theta=j) track_policy.run_policy(n_iterations=1000) performance = track_policy.performance.mean().iloc[0] track_result.append({"alpha": i, "theta": j, "performance": performance}) if performance > best_performance: best_alpha, best_theta, best_performance = i, j, performance print(f"Track Policy - Best alpha: {best_alpha}, Best theta: {best_theta}, Best performance: {best_performance}") track_result_df = pd.DataFrame(track_result) fig_track = px.imshow(track_result_df.pivot(index="alpha", columns="theta", values="performance"), title='Performance of the Track Policy', labels=dict(x="Best Theta", y="Alpha", color="Performance"), aspect="auto") fig_track.update_layout(xaxis=dict(tickmode='linear', tick0=0, dtick=1), yaxis=dict(tickmode='linear', tick0=0.1, dtick=0.1), width=1000, height=600) fig_track.show() ``` ``` Track Policy - Best alpha: 0.9, Best theta: 18, Best performance: 23.116 ``` ![newplot](https://hackmd.io/_uploads/Bk80151c0.png) * **Result:** The optimal `alpha` (smoothing factor) was 0.9, and the optimal `theta` (threshold) was 18, resulting in an average performance of 23.116. * **Interpretation:** The heatmap demonstrates how the Track policy's performance is influenced by different combinations of `alpha` and `theta`. #### Comparison of Policies We compare the performance of the three policies with their best-found parameters using 10,000 simulation runs each. ```python= # Comparison of Policies model = asm.AssetSellingModel(S0={"price": 20}, T=30, alpha=0.9) sell_low_policy = asp.SellLowPolicy(model=model, theta_low=14) high_low_policy = asp.HighLowPolicy(model=model, theta_low=17, theta_high=30) track_policy = asp.TrackPolicy(model=model, theta=18) policies = [sell_low_policy, high_low_policy, track_policy] policy_names = ['Sell Low', 'High Low', 'Track'] comparison_results = [] data = [] for policy, name in zip(policies, policy_names): policy.run_policy(n_iterations=10000) mean_performance = policy.performance.mean().iloc[0] std_performance = policy.performance.std().iloc[0] data.append(policy.performance) comparison_results.append({ 'Policy': name, 'Mean Performance': round(mean_performance, 3), 'Std Dev': round(std_performance, 3) }) print(f"{name} Policy - Mean performance: {mean_performance:.3f}, Standard deviation: {std_performance:.3f}") comparison_df = pd.DataFrame(comparison_results) fig_comparison = go.Figure(data=[ go.Bar(name='Mean Performance', x=comparison_df['Policy'], y=comparison_df['Mean Performance']), go.Bar(name='Std Dev', x=comparison_df['Policy'], y=comparison_df['Std Dev']) ]) fig_comparison.update_layout(title='Comparison of Policy Performances', barmode='group') fig_comparison.show() data_df = pd.concat(data, axis=1) data_df.columns = policy_names fig_histogram = go.Figure() for policy_name in data_df.columns: fig_histogram.add_trace(go.Histogram(x=data_df[policy_name], name=policy_name)) fig_histogram.update_layout(title='Histogram of Policy Performance', xaxis_title='Performance', yaxis_title='Frequency') fig_histogram.show() print("\nComparison Results:") print(comparison_df.to_string(index=False)) ``` ``` Sell Low Policy - Mean performance: 23.156, Standard deviation: 14.519 High Low Policy - Mean performance: 20.909, Standard deviation: 7.417 Track Policy - Mean performance: 21.858, Standard deviation: 17.267 ``` ![newplot](https://hackmd.io/_uploads/Bk6yg3k50.png) ![newplot](https://hackmd.io/_uploads/S1pbenkqR.png) ``` Comparison Results: Policy Mean Performance Std Dev Sell Low 23.156 14.519 High Low 20.909 7.417 Track 21.858 17.267 ``` * **Results:** * The Sell Low policy achieved the highest mean performance (23.156) but also exhibited the highest standard deviation (14.519), indicating greater volatility. * The High Low policy had a slightly lower mean performance (20.909) but a significantly lower standard deviation (7.417), suggesting more stable returns. * The Track policy showed the weakest performance with the lowest mean (21.858) and a high standard deviation (17.267). * **Interpretation:** * The bar chart and histogram provide a clear visual comparison of the policies' performance. * The choice between Sell Low and High Low policies depends on the investor's risk tolerance. Risk-averse investors might prefer High Low for its stability, while risk-tolerant investors might opt for Sell Low for potentially higher gains. #### Conclusion These experiments highlight the fundamental trade-off between risk and return in asset selling strategies. The optimal policy choice depends on an investor's individual risk tolerance and financial objectives. ## Extensions ### Time Series Price Processes with Learning The basic model assumes that price changes are independent and identically distributed (i.i.d.). However, in reality, asset prices often exhibit autocorrelation, where past prices influence future prices. To capture this behavior, we introduce a time series model for the price process: $$ p_{t+1} = \theta_0 p_t + \theta_1 p_{t-1} + \theta_2 p_{t-2} + \epsilon_{t+1} $$ Here, $\epsilon_{t+1}$ represents i.i.d. random noise, and the coefficients $( \theta_0, \theta_1, \theta_2 )$ quantify the impact of past prices on the future price. This necessitates an expansion of the state variable to include past prices: $$ S_t = (R_t, p_t, p_{t-1}, p_{t-2}) $$ ### Basket of Assets The model can be further extended to handle a basket of multiple assets. Recognizing that asset prices are often correlated, we introduce a covariance matrix $\Sigma$, where $\Sigma_{ij}$ represents the covariance between the random price changes of assets $i$ and $j$. To generate correlated price samples, we employ Cholesky decomposition, which decomposes the covariance matrix $\Sigma$ into a lower triangular matrix $L$ such that$\Sigma = LL^T$. The correlated price changes are then obtained as: $$ p_{t+1} = p_t + L\hat{Z} $$ where $p_t$ and $p_{t+1}$ are column vectors representing asset prices at times $t$ and $t+1$, respectively, and $\hat{Z}$ is a column vector of independent standard normal random variables. In this multi-asset model, the state variable is expanded to include either the current prices or both the current prices and the covariance matrix: $$ S_t = (R_t, p_t) \quad \text{or} \quad S_t = (R_t, p_t, \Sigma_t) $$ where: * $R_t$: Tracks the ownership status of each asset * $p_t$: Represents the asset prices * $\Sigma_t$: (Optional) The covariance matrix at time t, allowing for modeling scenarios where the covariance structure itself is dynamic. ### Example: Time-Series Based Trading Policy This section delves into a more sophisticated trading policy that leverages time-series forecasting and dynamic thresholds to guide asset selling decisions. By directly modeling the evolution of asset prices over time, we move beyond simplistic price representations and aim to capture the inherent trends and patterns in the market. #### Time Series Price Model We utilize the following time-series model to forecast asset prices: $$ \hat{p}_{t+1} = 0.7p_t + 0.2p_{t-1} + 0.1p_{t-2} $$ This model predicts the price at the next time step $\hat{p}_{t+1}$ as a weighted average of the current and two previous prices $(p_t, p_{t-1}, p_{t-2})$. The weights $(0.7, 0.2, 0.1)$ reflect the relative importance of each past price in predicting the future price. #### Defining the State Variable To inform effective trading decisions, we expand the state variable to include historical price information: $$ S_t = (R_t, p_t, p_{t-1}, p_{t-2}) $$ where * $R_t$: Represents the current asset ownership status (1 if held, 0 if not held) * $p_t$, $p_{t-1}$, $p_{t-2}$: The current and past two prices of the asset, providing context for price forecasting #### The Time-Series Policy The time-series policy triggers a sell action when the actual price deviates significantly from the forecasted price or at the end of the trading period. We introduce a threshold value (θ) to define what constitutes a "significant" deviation. $$ X^{\text{time-series}}(S_t | \theta) = \begin{cases} 1 & \text{if } p_t < \bar{p}_t - \theta \text{ or } p_t > \bar{p}_t + \theta \\ 1 & \text{if } t = T \\ 0 & \text{otherwise} \end{cases} $$ This policy initiates a sell when: 1. The absolute difference between the actual price $p_t$ and the forecasted price $\hat{p}_{t+1}$ exceeds the threshold $\theta$, and the asset is currently held ($R_t = 1$). 2. It's the end of the trading period ($t = T$), and the asset is still held ($R_t = 1$). ```python= from typing import List, Dict, Any from BaseClasses.SDPPolicy import SDPPolicy from BaseClasses.SDPModel import SDPModel import AssetSellingModel as asm class TimeSeriesPolicy(SDPPolicy): def __init__(self, model: SDPModel, policy_name: str = "TimeSeries", theta: float = 10): super().__init__(model, policy_name) self.theta = theta def get_decision(self, state: Any, t: int, T: int) -> Dict[str, int]: if not hasattr(state, 'price_estimate') or state.price_estimate is None: return {"sell": 0, "hold": 1} # Hold the asset if no price estimate is available price_difference = abs(state.price - state.price_estimate) if t == T - 1: # Force selling at the final time step return {"sell": 1, "hold": 0} elif price_difference > self.theta: return {"sell": 1, "hold": 0} # Sell if the price difference exceeds the threshold else: return {"sell": 0, "hold": 1} # Otherwise, hold def set_theta(self, theta: float) -> None: """Update the threshold value.""" self.theta = theta ``` ##### Experiment: Finding the Optimal Threshold We conduct experiments to determine the optimal $\theta$ value that maximizes the policy's performance. ```python= import plotly.express as px from typing import List, Dict import AssetSellingModel as asm from TimeSeriesPolicy import TimeSeriesPolicy def run_threshold_optimization( model: asm.AssetSellingModel, theta_range: range, n_iterations: int = 10000 ) -> Dict[str, Any]: time_series_results: List[Dict[str, float]] = [] best_theta = None best_performance = -np.inf for theta in theta_range: time_series_policy = TimeSeriesPolicy(model=model, theta=theta) performance = time_series_policy.run_policy(n_iterations=n_iterations) time_series_results.append({"theta": theta, "performance": performance}) if performance > best_performance: best_performance = performance best_theta = theta return { "results": time_series_results, "best_theta": best_theta, "best_performance": best_performance } def plot_results(results: List[Dict[str, float]], title: str) -> None: df = pd.DataFrame(results) fig = px.line(df, x='theta', y='performance', title=title) fig.show() # Experiment Execution # Initialize the model model = asm.AssetSellingModel(S0={"price": 20}, T=30) # Run the optimization for TimeSeriesPolicy optimization_results = run_threshold_optimization(model, range(1, 21)) # Print the results print(f"Time Series Policy - Best parameter: theta = {optimization_results['best_theta']} " f"with an objective of {round(optimization_results['best_performance'], 3)}.") # Plot the results plot_results(optimization_results['results'], 'Performance of the Time Series Policy') ``` ``` Time Series Policy - Best parameter: theta = 15 with an objective of 22.473. ``` ![newplot](https://hackmd.io/_uploads/Byr0ZcksC.png) * **Result:** The optimal $\theta$ was found to be 15, resulting in an average objective value of 22.473 * **Interpretation:** The line plot visualizes how the performance of the Time-Series policy varies with different $\theta$ values. #### Price-Dependent Thresholds To enhance adaptability to market conditions, we introduce thresholds that vary based on the current asset price $\theta(p_t)$. The policy is modified accordingly: $$ X^{\text{time-series}}(S_t | \theta) = \begin{cases} 1 & \text{if } p_t < \bar{p}_t - \theta(p_t) \text{ or } p_t > \bar{p}_t + \theta(p_t) \\ 1 & \text{if } t = T \\ 0 & \text{otherwise} \end{cases} $$ The function $\theta(p_t)$ can be designed to reflect various market behaviors. For simplicity, we assume a step function for $\theta(p_t)$: $$ \theta(p_t) = \theta_i \quad\text{ for } p_t \in I_i $$ where $\theta_i$ is a constant threshold for the price interval $I_i = [5i, 5(i+1))$. ```python= class ModifiedTimeSeriesPolicy(SDPPolicy): def __init__(self, model: SDPModel, policy_name: str = "ModifiedTimeSeries", theta: List[float] = None): super().__init__(model, policy_name) self.theta = theta if theta is not None else [10] * 20 def get_decision(self, state: Any, t: int, T: int) -> Dict[str, int]: if not hasattr(state, 'price_estimate') or state.price_estimate is None: return {"sell": 0, "hold": 1} price_difference = abs(state.price - state.price_estimate) threshold = self.get_threshold(state.price_estimate) if t == T - 1: # Force selling at the final time step return {"sell": 1, "hold": 0} elif price_difference > threshold: return {"sell": 1, "hold": 0} # Sell if the price difference exceeds the threshold else: return {"sell": 0, "hold": 1} # Otherwise, hold def get_threshold(self, price: float) -> float: """Get the threshold based on the current price.""" index = min(int(price / 5), len(self.theta) - 1) return self.theta[index] def set_theta(self, theta: List[float]) -> None: """Update the threshold list.""" self.theta = theta ``` ##### Experiment: Exploring Threshold Strategies We explore various threshold strategies, ranging from a static threshold to more complex price-dependent functions. * **L0:** The Baseline: A constant threshold function with all $\theta_i$ values set to 13 (the optimal value from the previous experiment). * **L1-L5:** Different price-dependent threshold strategies, including linearly increasing, bounded linear, decreasing, and decreasing with initial risk aversion. ```python= import numpy as np import pandas as pd import plotly.graph_objects as go from typing import List, Dict, Any import AssetSellingModel as asm from ModifiedTimeSeriesPolicy import ModifiedTimeSeriesPolicy def define_threshold_strategies() -> Dict[str, List[float]]: return { "L0": [13] * 20, # Baseline: Constant threshold "L1": list(range(20)), # Linearly increasing threshold "L2": [min(x, 4) for x in range(20)], # Bounded linear increase (max 4) "L3": [min(x, 3) for x in range(20)], # Bounded linear increase (max 3) "L4": [max(4 - x//5, 0) for x in range(20)], # Decreasing threshold "L5": [max(2 - x//5, 0) + 2 for x in range(20)] # Decreasing with initial risk aversion } def run_experiment(model: asm.AssetSellingModel, strategies: Dict[str, List[float]], n_iterations: int = 1000) -> pd.DataFrame: data = [] for name, theta in strategies.items(): policy = ModifiedTimeSeriesPolicy(model=model, theta=theta) policy.run_policy(n_iterations=n_iterations) data.append(policy.performance) mean_performance = round(policy.performance.mean().iloc[0], 3) std_dev = round(policy.performance.std().iloc[0], 3) print(f"{name}: Mean = {mean_performance}, Std Dev = {std_dev}") data_df = pd.concat(data, axis=1) data_df.columns = strategies.keys() return data_df def plot_histogram(data_df: pd.DataFrame, title: str) -> None: fig_data = [] for column in data_df.columns: fig_data.append(go.Histogram(x=data_df[column], name=column)) fig = go.Figure(data=fig_data) fig.update_layout( title=title, xaxis_title='Performance', yaxis_title='Frequency', barmode='overlay', bargap=0.1) fig.show() def plot_comparison(data: pd.DataFrame, strategy1: str, strategy2: str) -> None: fig = go.Figure() fig.add_trace(go.Histogram(x=data[strategy1], name=strategy1, opacity=0.75)) fig.add_trace(go.Histogram(x=data[strategy2], name=strategy2, opacity=0.75)) fig.update_layout( title=f'Comparison of {strategy1} and {strategy2}', xaxis_title='Performance', yaxis_title='Frequency', barmode='overlay' ) fig.show() # Experiment Execution # Initialize the model model = asm.AssetSellingModel(S0={"price": 20}, T=30) # Define strategies strategies = define_threshold_strategies() # Run the experiment results = run_experiment(model, strategies) # Plot overall histogram plot_histogram(results, 'Histogram of Policy Performance') # Plot individual comparisons comparisons = [("L1", "L2"), ("L2", "L3"), ("L3", "L4"), ("L4", "L5")] for strategy1, strategy2 in comparisons: plot_comparison(results, strategy1, strategy2) ``` ``` L0: Mean = 21.761, Std Dev = 17.404 L1: Mean = 21.067, Std Dev = 10.178 L2: Mean = 20.399, Std Dev = 8.36 L3: Mean = 20.045, Std Dev = 6.587 L4: Mean = 19.701, Std Dev = 9.48 L5: Mean = 19.569, Std Dev = 9.319 ``` ![newplot](https://hackmd.io/_uploads/B1EkO3Ji0.png) * `L0` ![newplot](https://hackmd.io/_uploads/rJm2K2kjC.png) * `L1` ![newplot](https://hackmd.io/_uploads/BJ26F3yjA.png) * `L2` ![newplot](https://hackmd.io/_uploads/SyPk531sR.png) * `L3` ![newplot](https://hackmd.io/_uploads/r1GEq3ysA.png) * `L4` ![newplot](https://hackmd.io/_uploads/HyASchksA.png) * `L5` ![newplot](https://hackmd.io/_uploads/HJrwc3yjA.png) The results demonstrate the trade-off between performance and risk. The baseline policy $L_0$ has the highest average performance but also the highest risk. Other policies offer lower risk but with slightly reduced potential returns. ## Reference * Chap 2 of Warren B. Powell (2022), Sequential Decision Analytics and Modeling: Modeling with Python [link](https://castle.princeton.edu/sdamodeling/) * https://github.com/djanka2/stochastic-optimization/tree/master/AssetSelling