Mechanism design for goal-seeking agents

# Mechanism design for goal-seeking agents Let's consider a fully observed MDP and $K$ agents each trained to optimise a private reward function $r_k$. For simplicity, let's say that the reward is a function of the terminate state of the MDP. > *Example:* The MDP can be an online chat with a human, where actions are the chatbot's responses, and the state is the current transcript of the chat so far. There $K$ agents are different chatbot LLMs. Each version might optimise for a slightly different outcome. It is possible that one or multiple agents has a sinister goal, i.e. that their private reward function rewards misleading or deceiving the user. Our goal is to design a mechanism, a game, whereby the $K$ agents collaboratively control the MDP, in such a way that they reveal information about their private reward functions along the way. We seek an incentive compatible mechanism: a mechanism in which the rational behaviour of each agent is to honestly share private information they have. I.e. in this case, if the agent wants to optimise for reward function $r$, it's optimal behaviour involves sharing information relating to $r$ honestly. ## Related: Vickrey–Clarke–Groves mechanism The [VCG mechanism](https://en.wikipedia.org/wiki/Vickrey%E2%80%93Clarke%E2%80%93Groves_mechanism) is a truthful mechanism that involves bidding on a set of items or outcomes by a set of agents. There are $K$ agents who have to collectively choose one out of $N$ possible outcomes $\{x_n\}$. The agents each have their own private value function $v_k$ which assigns a personal value to each outcome. I.e. $v_k(x_n)$ is the degree to which outcome $x_n$ would be useful for agent $k$. By the value functions being private, we mean that they are initially known to the agents only. In the VCG mechanism each agent places a bid for each outcome $b_{k,n}$. Then, these bids are combined, an outcome is selected, and each agent receives a payoff based on their private utility $v_k(x_n)$ plus an additional bonus based on everyone's bids and the selected outcome. The mechanism is designed to be incentive compatible, meaning that each agent optimises its payoff, if its bids correspond to their private valuation of options, i.e. when $b_{k,n} = v_k(x_n)$. ## Research Question Is it possible to design an incentive compatible mechanism for the goal-seeking agents, such htat there is a bidding process involved in selecting the next action, and each agent's rational behaviour involves revealing their true private Q-value for each possible action in each step?