Multi Objective RL
---
- Parisi, Simone, Matteo Pirotta, and Marcello Restelli. 2016. “Multi-Objective Reinforcement Learning through Continuous Pareto Manifold Approximation.” The Journal of Artificial Intelligence Research 57 (October): 187–227.
- Pirotta, Matteo, Simone Parisi, and Marcello Restelli. 2014. “Multi-Objective Reinforcement Learning with Continuous Pareto Frontier Approximation Supplementary Material.” arXiv [cs.AI]. arXiv. http://arxiv.org/abs/1406.3497.
- Improving the Pareto UCB1 Algorithm on the Multi-Objective Multi-Armed Bandit (http://vision.gel.ulaval.ca/~cgagne/pubs/bayesopt-nips2014.pdf)
- Multi-Objective Reinforcement Learning using Sets of Pareto Dominating Policies (https://dl.acm.org/doi/pdf/10.5555/2627435.2750356)
- Hypervolume-based Multi-Objective Reinforcement Learning (https://ai.vub.ac.be/sites/default/files/EMO_2013_1.pdf)
- Multi-Task Learning as Multi-Objective Optimization (https://papers.nips.cc/paper/7334-multi-task-learning-as-multi-objective-optimization.pdf)
- Policy Gradient Approaches for Multi–Objective Sequential Decision Making (https://www.ias.informatik.tu-darmstadt.de/uploads/Site/EditPublication/Parisi_ADPRL14.pdf)
- A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation (http://papers.nips.cc/paper/9605-a-generalized-algorithm-for-multi-objective-reinforcement-learning-and-policy-adaptation.pdf)[Narasimhan]
- Q learning for multi-objective optimization
- Maintains an "envelope" of Q values, has theory that algorithm will converge to $Q^*$ for any linear scalarization
- Scalable, but needs to sample from the set of possible scalarizations.
- Max-value Entropy Search for Multi-Objective Bayesian Optimization (https://papers.nips.cc/paper/8997-max-value-entropy-search-for-multi-objective-bayesian-optimization.pdf)
- Pure exploration for Pareto front using Bayesian optimization
- GP for each objective. Sample from each GP, use evolutionary algorithm to maximize the multi-objective proxy function.
- Compute entropy and build an acquisiton function (sum over entropy functions) to determine next point to query
- Multi-objective Bayesian optimisation with preferences over objectives (https://papers.nips.cc/paper/9390-multi-objective-bayesian-optimisation-with-preferences-over-objectives.pdf)
- Multiobjective but has a preference ordering amongst objectives
- Only cares about the Pareto optimal solutions that satisfy these preferences
- Comes up with an acquisition function to be used with the GP
- EXPLICIT PARETO FRONT OPTIMIZATION FOR CONSTRAINED REINFORCEMENT LEARNING (https://openreview.net/attachment?id=pOHW7EwFbo9&name=pdf)
- Under review at ICLR21
- Multi-Objective Deep Reinforcement Learning (https://arxiv.org/pdf/1610.02707.pdf)
- Iterative scalarization (by picking a "corner" point to approximate the Convex Coverage Set (CCS))
- Given scalarization, solve single objective MDP using DQN.
- A Distributional View on Multi-Objective Policy Optimization (https://arxiv.org/pdf/2005.07513.pdf)
- Use different Q functions for each objective in $[K]$.
- Find new policy by minimizing $\sum_{i}^{K} \exp(-Q_i(s,a))$
- MULTI-OBJECTIVE VALUE ITERATION WITH PARAMETERIZED THRESHOLD-BASED SAFETY CONSTRAINTS (https://openreview.net/attachment?id=HyeS73ActX&name=pdf)
- Convert multi-objective problem -> minimize primary objective s.t. all other objectives < fixed, known threshold.
- Hacky, no theory
- Stochastic Convex Optimization with Multiple Objectives (https://papers.nips.cc/paper/4942-stochastic-convex-optimization-with-multiple-objectives) [Mahdavi]
- Convert multi-objective problem -> minimize primary objective s.t. all other objectives < fixed, known threshold.
- Constrained problem -> Lagrangian formulation -> Primal-dual -> 2 player no-regret learning
- Balduzzi, David, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, and Thore Graepel. 2018. “The Mechanics of N-Player Differentiable Games.” arXiv [cs.LG]. arXiv. http://arxiv.org/abs/1802.05642.
- Bohez, Steven, Abbas Abdolmaleki, Michael Neunert, Jonas Buchli, Nicolas Heess, and Raia Hadsell. 2019. “Value Constrained Model-Free Continuous Control,” February. https://arxiv.org/abs/1902.04623v1.
- Bokrantz, Rasmus, and Albin Fredriksson. n.d. “Necessary and Sufficient Conditions for Pareto Efficiency in Robust Multiobjective Optimization.” https://people.kth.se/~albfre/robustmco.pdf.
- C Abdallah, V. Koltchinskii. 2003. “Applications of Statistical-Learning Control in Systems and Control.” In IFAC Workshop. http://www.eece.unm.edu/controls/papers/Ari_CTA_Kol.pdf.
- Dong, Chaosheng, and Bo Zeng. 2020. “Inverse Multiobjective Optimization Through Online Learning.” arXiv [cs.LG]. arXiv. http://arxiv.org/abs/2010.06140.
- Eichfelder, Gabriele. 2007. “Scalarizations for Adaptively Solving Multi-Objective Optimization Problems.” Computational Optimization and Applications 44 (2): 249.
- Emmerich, Michael T. M., and André H. Deutz. 2018. “A Tutorial on Multiobjective Optimization: Fundamentals and Evolutionary Methods.” Natural Computing 17 (3): 585–609.
- J Hu, T. Homem-de-Mello. 2010. “Multi-Criterion Robust and Stochastic Dominanceconstrained Models with Application to Budget Allocation in Homeland Security.” Manuscript, January. http://www.iems.northwestern.edu/docs/working_papers/WP_10-01.pdf.
- Jones, P. W., A. M. Lewis, and R. Hartley. 1995. “Some Designs for Multicriteria Bandits.” In Adaptive Statistical Procedures and Related Topics, 88–94. Hayward, CA: Institute of Mathematical Statistics.
- Li, Xiaowei, Wei Wang, Chengcheng Xu, Zhibin Li, and Baojie Wang. 2015. “Multi-Objective Optimization of Urban Bus Network Using Cumulative Prospect Theory.” Journal of Systems Science & Complexity 28 (3): 661–78.
- Liao, Li-Zhi, and Duan Li. 2002. “Adaptive Differential Dynamic Programming for Multiobjective Optimal Control.” Automatica: The Journal of IFAC, the International Federation of Automatic Control 38 (6): 1003–15.
- Lu, Shiyin, Guanghui Wang, Yao Hu, and Lijun Zhang. 2019a. “Multi-Objective Generalized Linear Bandits.” arXiv [cs.LG]. arXiv. http://arxiv.org/abs/1905.12879.
- Find a pareto optimal arm in the GLB regret setting
- Maintains an approximate Pareto front and eliminates an arm if its UCB value is Pareto dominated
- For exploration, picks a random arm from the set of non-eliminated arms.
- Proves a $\sqrt{T}$ pareto regret bound
- For Pareto regret, gap of an arm = min $\epsilon$ to be added to its reward vector so that it becomes Pareto optimal.
- Mannor, S., V. Perchet, and G. Stoltz. 2014. “Approachability in Unknown Games: Online Learning Meets Multi-Objective Optimization.” In arXiv Preprint arXiv:1402.2043 COLT. http://arxiv.org/abs/1402.2043.
- Miettinen, Kaisa. 1998. Nonlinear Multiobjective Optimization. Springer, Boston, MA.
- Mossalam, Hossam, Yannis M. Assael, Diederik M. Roijers, and Shimon Whiteson. 2016. “Multi-Objective Deep Reinforcement Learning.” arXiv.org cs.AI (October). http://arxiv.org/abs/1610.02707v1.
- Parambath, Shameem A. Puthiya, Nicolas Usunier, and Yves Grandvalet. 2015. “Theory of Optimizing Pseudolinear Performance Measures: Application to F-Measure.” http://arxiv.org/abs/1505.00199v1.
- Paria, Biswajit, Kirthevasan Kandasamy, and Barnabás Póczos. 2018. “A Flexible Framework for Multi-Objective Bayesian Optimization Using Random Scalarizations.” arXiv [cs.LG]. arXiv. http://arxiv.org/abs/1805.12168.
- Roijers, Diederik M., Peter Vamplew 0001, Shimon Whiteson, and Richard Dazeley. 2013. “A Survey of Multi-Objective Sequential Decision-Making.” The Journal of Artificial Intelligence Research, January. http://jair.org/index.php/jair/article/view/10836.
- Ruzika, S., and M. M. Wiecek. 2005. “Approximation Methods in Multiobjective Programming.” Journal of Optimization Theory and Applications 126 (3): 473–501.
- Salmei, Hossein, and Mohammad Ali Yaghoobi. 2020. “Improving the Min–max Method for Multiobjective Programming.” Operations Research Letters 48 (4): 480–86.
- Sawaragi, Yoshikazu, Hirotaka Nakayama, and Tetsuzo Tanino. 1985. Theory of Multiobjective Optimization-Academic Press. Vol. 176. Mathematics in Science and Engineering. Academic Press.
- Wakuta, Kazuyoshi. 2014. “A Multi-Objective Shortest Path Problem.” Mathematical Methods of Operations Research 54 (3): 445–54.
- Wiering, Marco A., and Edwin D. de Jong. n.d. “Computing Optimal Stationary Policies for Multi-Objective Markov Decision Processes.” In 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 158–65. IEEE.
- Yahyaa, S., and B. Manderick. 2015. “Thompson Sampling for Multi-Objective Multi-Armed Bandits Problem.” Proceedings: A Conference of the American Medical Informatics Association /... AMIA Annual Fall Symposium. AMIA Fall Symposium. https://books.google.ca/books?hl=en&lr=&id=USGLCgAAQBAJ&oi=fnd&pg=PA47&dq=thompson+sampling+linear+bandits&ots=FtcbhoKUUQ&sig=ZH61AkZrTyZI9KNnXZ5DErbA6gY.