# Mathematics Workshop: Theory of Electromagnetic Activity ## Challenge 2 - Game Theory and EMA Uncertainty ![Screenshot 2024-02-04 at 12.27.40](https://hackmd.io/_uploads/H1uwXzTcT.png) | Name | | Role | | -------- | -------- | -------- | | Stephan Weiss | stephan.weiss@strath.ac.uk | Challenge lead | | Joe Gedge | jgedge@dstl.gov.uk | Challenge sponsor | | Mark Osborne | mosborne2@dstl.gov.uk | Challenge sponsor | | Jess Enright | jessica.enright@glasgow.ac.uk| Project member | | John Beasley | john.beasley@brunel.ac.uk | Project member | | Richard Claridge | richard.claridge@paconsulting.com | Project member | | | | Project member | Team members are all considered **Owners** to notes in the workspace when configuring note permissions. <br /><br /> ## Slide about challenge: ![image](https://hackmd.io/_uploads/S1Nd6-Zi6.png) There is a separate document by John Beasley dealing with: introduction to game theory, identifying irrational players, uncertainty, cooperation and discovery strategies to resolve uncertainty, available from: https://www.dropbox.com/scl/fi/9rmj9qr5j3spt8xydxpau/paper.pdf?rlkey=zbfgb8yxaar555jafnu14ec82&dl=0 #### Standard Game Typically a game is described by players $P_i$, $i=1,2,\dotsc$, their actions e.g. $X_i$, $Y_i$, and its rewards / outcomes --- typically constants / numerical values: | | $X_2$ | $Y_2$ | | -------- | -------- | -------- | | $X_1$ | $(\;0,-2)$ | $(-1,-6)$ | | $Y_1$ | $(\;4,1\;)$ | $(\;2,\;2)$ | An entry $(o_1,\; o_2)$ means that a particular combination of actions results in an outcome $o_1$ for $P_1$, and in $o_2$ for $P_2$. An equilibibrium is where no single player can improve their outcome through unilaterally changing their action. #### Initial Challenge Remits/Questions: 1. Is embedding uncertainty a useful framework for capturing/exploiting seemingly "irrational" players, who don’t initially seem to act to increase their overall reward? 2. Is it viable to use uncertainty in this model to capture the “character” of the opposing player? 3. Is it possible to organise players into cooperative and non-cooperative groups such that rewards are maximised in a more localised or globalised fashion? 4. In the above game-theoretic framework with uncertainty, which methods lend themselves to computational efficiency? ## Embedding Uncertainty Outcomes as unknowns or as probability distributions: | $P_1$ \ $P_2$ | $X_2$ | $Y_2$ | | -------- | -------- | -------- | | $X_1$ | $(\;0,p(x))$ | $(-1,-6)$ | | $Y_1$ | $(\;4,1\;)$ | $(\;2,\; \mathrm{?})$ | ### Solving Games with/despite Uncertainty * Problems with embedded uncertainty via PDFs have been addressed in the literature, see e.g. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6678800 D. B. Smith, M. Portmann, W. L. Tan and W. Tushar, "Multi-Source–Destination Distributed Wireless Networks: Pareto-Efficient Dynamic Power Control Game With Rapid Convergence," in IEEE Transactions on Vehicular Technology, vol. 63, no. 6, pp. 2744-2754, July 2014. ### Reducing Uncertainty * Concept of explore & exploit, typically via sequential/iterative/repeated games; John has some ideas on discovery given a limited number of iterations, or a limit on the accumulated loss; * Inverse equilibrium problem --- try to infer outcome matrix from actions taken at equilibria https://icml.cc/Conferences/2011/papers/599_icmlpaper.pdf http://reports-archive.adm.cs.cmu.edu/anon/anon/usr0/ftp/usr/ftp/2023/CMU-CS-23-121.pdf * Simulating your way out https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0118127 * Opponent modelling https://www.jair.org/index.php/jair/article/download/12889/26762/29406 ## Cooperation / Teams of Players * [Contained in the general problem definition provided by Mark] Khanafer, A.; Bhattacharya, S.; Bascar, T, Adaptive Resource Allocation in Jamming Teams Using Game Theory, 2011 International Symposium of Modeling and Optimization of Mobile, Ad Hoc, and Wireless Networks Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt):395-400, May, 2011. * Having teams of players enables to take a combination of actions; the outcome matrix becomes high-dimensional, but complexity can be absorved; * Example by John Beasley: player $P_1$ is joined by $P_3$ cooperatively. Together, they can take actions $\{X_1,X_3\}$, $\{X_1,Y_3\}$, $\{Y_1,X_3\}$, or $\{Y_1,Y_3\}$ | $\{P_1,P_3\}$ \ $P_2$ | $X_2$ | $Y_2$ | | -------- | -------- | -------- | | $\{X_1,X_3\}$ | (0,-4) | (-2,–12) | | $\{X_1,Y_3\}$ | (-3,-5) | (3,-9) | | $\{Y_1,X_3\}$ | (4,-1) | (-7,-4) | | $\{Y_1,Y_3\}$ | (1,-2) | (-2,-1) | ## (One possible) Setting - $n$ agents $g_0 ... g_n$ - for each agent we have a set of actions $\mathcal{A}_i = \{a_0 ...a_{|\mathcal{A}_i|}\}$ - for each agent $g_i$ we have a function that records our agent's current belief about the distribution of the payoff associated with each possible joint action. - consider: different families of these: e.g. local, introspective only, count-based, etc - a function that takes a joint action $\{$action for $g_0$, action for $g_1$ ... action for $g_n\}$ and gives back a function that relates each agent to a payoff. - A turn goes like this: - agents choose an action based on their function that describes beliefs - receive payoff in accordance to our payoff function - agents update their beliefs using a specified method (these could vary by player? could be interesting to investigate different kinds of these?) - strawman update: update chosen action based on payoff given, decay other actions toward zero - possible method: Bayesian update - possible update: regret-based ## Coded game results ### Coordination - agents have two actions, and start with a roughly uniform preference over the two - payoffs are either: - payoff of 1 if all agents match, else 0 - payoff in proportion to the number of agents that you match - Q: will agents learn to 'match' even though they do not model others actions? - A: Yes, and they do this better with all-agents match for small number of agents, proportion for a larger number of agents ![image](https://hackmd.io/_uploads/rkfvYDMsp.png) ![image](https://hackmd.io/_uploads/B1UOKPMop.png) --- Question: how many agents do I need to initially influence to drive the system specifically to one of the two equilibria? Experiment: 'tell' a small number of agents that a particular action is good. Measure: what proportion of agents end up taking that good action in the coordination game? *Note: below, the y-axis should read: proportion of agents matching desired action* ![image](https://hackmd.io/_uploads/S1zkGFMj6.png) ![image](https://hackmd.io/_uploads/SJdxGYzjT.png) ![image](https://hackmd.io/_uploads/ryvbfKGip.png) Next questions: - what is the expression for this number? Can something from control problems tell us? - what is the analogous result from anti-coordination? - go does this vary as systems get larger? --- ## Assessing behaviour against a known game #### Scenario: - we have a matrix giving a classic game in the usual way. - we can get the mixed Nash Equilibria - if we have a sequence of our opponent's moves, we could assess the likelihood that one of these has generated the observed actions. - this helps us know if we are wrong about the game (and we have either an irrational opponent or one playing a different game.) #### Background * Have a player's characteristics changed? E.g., is a player replaced by another? #### Example * Assume a game with player $P_1$ having available actions $\{X_1, \; Y_1\}$ and an opponent player $P_2$ with actions $\{X_2, \; Y_2\}$, and the following outcome matrix: | | $X_2$ | $Y_2$ | | ----- | ----------- |:-----------| | $X_1$ | $(\;0,-2)$ | $(-1,-6)$ | | $Y_1$ | $(\;4,1\;)$ | $(\;2,\;2)$ | * We gather the outcomes for $P_1$ in a matrix $\mathbf{A}_1 = [0,\; -1;\; 4,\; 2]$ and for $P_2$ in a matrix $\mathrm{A}_{2} = [-2,\; -6;\; 1,\; 2]$. * Now assume that $\hat{\mathbf{A}}_2 = [A_{2,11},\; A_{2,12}; \; A_{2,21}, \; A_{2,22}]$ is not known. By probing the game we establish: - $P_1$ picks $X_1$ $\longrightarrow$ $P_2$ picks $X_2$; hence we know that $A_{2,11}>A_{2,12}$; - $P_1$ picks $Y_1$ $\longrightarrow$ $P_2$ picks $Y_2$; hence we know that $A_{2,22}>A_{2,21}$. * Let's say we look for $\hat{\mathrm{A}}_2$ to be as close as possible to $\mathrm{A}_1$ in a least squares sense, then \begin{align} \hat{\mathbf{A}}_{2,\ast} & = \mathrm{arg} \min_{\hat{\mathbf{A}}_2} \| \mathbf{A}_1 - \hat{\mathbf{A}}_2\|^2_{\mathrm{F}} \qquad \mathrm{s.t.} & A_{2,11}&>A_{2,12} \nonumber \\ & & A_{2,22}&>A_{2,21} \end{align} * This is a quadratic programming problem with an inequality constraint. Using e.g. CVX (Stephen Boyd, Stanford) as a solver, we obtain \begin{align} \hat{\mathbf{A}}_{2,\ast} & = \left[ \begin{array}{rr} 0 & -1 \\ 3 & 3+\epsilon \end{array} \right] \; . \end{align} This matrix satisfies the strategy of player $P_2$, and possesses a metric $\| \mathbf{A}_1 - \hat{\mathbf{A}}_{2,\ast} \|^2_{\mathrm{F}}=2$. --- ## Summary and Potential Next Steps * Some aspects of the challenge appear to be already addressed in the literature (embedding uncertainty, reducing uncertainty, trying to infer an opponent's outcomes, games with teams of players / cooperation); * Example of games with larger number of players and iterations to learn a solution with different levels of coordination; * We have attempted to assess the behaviour of an unknown player though matching observed actions to a game; * We have addressed a number of computationally viable schemes. --- ## Other Material ## Questions/Ideas: (notes, nothing real) - minimum valuable example? - uncertainty in outcome uncertainty in observation/behaviour - idea: graphical game on unknown graph - idea: regret-based calculation when payoffs unknown and learned over time - can you force someone to learn the wrong set of payoffs? - analogue to guessing the weights of a neural/reservoir network - Is this multi-objective? We want to: - Maximise our own reward - Minimise the opponent's reward - Gain information about the game - Reduce the opponent's information about the game - Gain information about the opponent - Minimise the opponents information about us #### Other potential routes (under this heading?) * [Dynamic games] https://www.sciencedirect.com/science/article/abs/pii/S1367578822000128 * [Related - Evolutionary games] https://boulderschool.yale.edu/sites/default/files/files/frey_lecture_notes_games.pdf * [non-equilibrium centipede, bounded rationality] https://www.sciencedirect.com/science/article/pii/S0899825620300087 "Irrational" Behaviour / Change of Player Any behaviour in terms of actions can be explained by a game (as defined by the outcome matrix) --- we expect that there is a theorem for this. We want to find a matrix that explains this behaviour by a matrix that is as close as possible to an expected matrix (e.g. of "our" player. Can we use the norm as a metric to assess "irrationality" / classify different players / etc? From probing the game by selected actions, we obtain a set of relations for (some) entries of the ourcome matrix, leading to a constrained optimisation problem. Q: what type of optimisation problem do we have? Might be able to solve this using CVX (Stephen Boyd, Stanford). ## Team roles and note permission All members in the workspace have read access to every note in it. Only writers and admins have write access, and only admins can delete a team note. Team members receive notifications when someone else edit or mention them in a team note. Team members are all considered **Owners** to notes in the workspace when configuring note permissions. ## How do I invite team members? 1. In the Overview page, click the :gear: at the right of **Team space**, and click **Team settings**. <img src="https://hackmd.io/_uploads/HyJfG7jYK.png" width=300 style="display:block;margin:30px auto" /> 2. In **Team Settings** page, 1. Fill in the HackMD username or the email address of the member you'd like to add. S/he will receive an invitation email and join the workspace by following the instructions. 3. Set her/his role. - Admin: can manage members and billings (with a billing role), read and write notes. - Writer: read and write notes. - Reader: read notes. 4. Click **Add**. :::warning :bulb: This role takes a higher priority to the default Owners of the note (that is, all workspace members). So a Reader member can't write a note in the workspace even if it's set to "Owners can write". ::: <img src="https://hackmd.io/_uploads/ryxdzmjtt.png" width=600 style="display:block;margin: 30px auto" /> Invite more members to get the most out of your Team Workspace. If you have met the limit of workspace members, you can [puchase more](https://hackmd.io/c/tutorials/%2F%40docs%2Funlock-team-member-limit?utm_source=prebuilt-note&utm-medium=inline-link). ## How do I work with someone not in my Team Workspace? Invite them as invitees on the note. Here's [how](https://hackmd.io/c/tutorials/%2Fs%2Finvite). ## How do I backup the notes? In **Team settings** page, navigate to **Team notes** and click **Download all notes**. <img src="https://hackmd.io/_uploads/rygDBQoFY.png" width=300 style="display:block;margin: 30px auto" /> ## How do I set default permissions? :::info This is a Prime only feature. ::: Set default permissions so you don't have to worry if any of your team members would ever forgot to set them properly. 1. Set default permissions of the notes in **Team Settings** > **Team notes**. 2. Navigate down and click **Save**. <img src="https://hackmd.io/_uploads/Syo3rXoKF.png" width=600 style="display:block;margin: 30px auto" /> ## How do I switch between a public or private team? There is no limit counts to members of a Public Team, so it's great for sharing knowledge and working with the public. For the greater good of our communities, HackMD lets you switch between the two, so you can switch between them as your community grows. Here's [how](https://hackmd.io/@docs/switch-public-private-team) to switch. # How to get in touch? Feel free to ping us via: - :mailbox: support@hackmd.io - Mention `@hackmdio` on [Twitter](https://twitter.com/hackmdio)