## New response to aVLs
Thank you for reading our rebuttal. We are happy that it clarified some of your concerns.
We agree that in some contexts, if the scenario consisted of repeated interactions, and we had access to historical data, one may learn the most likely value of $\beta$. Considering different interventions in scenarios with repeated centralized matching is an important future direction.
The focus of this paper, following a long line of prior works (see e.g. [39, 19, 26, 20, 52]), is to analyze the effect of representational constraints in the one-round setting. Our main result significantly extends this line of work (that studied the set selection problem) to the setting of assignments.
We additionally note that interventions that require estimates of $\beta$ may deteriorate with the inaccuracy of the estimates of $\beta$ and are not robust to explicit biases (as may arise with human evaluators). In contrast, the considered interventions (that are also used in the real world) do not require estimating $\beta$ and are robust to explicit biases.
## New Response to the Area Chair
Thanks for your time in reading our comment and for your response.
We have two comments on the concerns raised by Reviewer F4rj in their response.
**A) Concerns regarding the model:** We already performed a simulation that extends a simulation in the paper to consider “noise that depends on the group” (differential variance bias model). We did not include this in our first rebuttal due to space constraints. Given that Reviewer F4rj did not ask this in the "Questions" part of the review, it seems unfair that they are using this in their argument against the paper.
We have included results from this simulation in our new response to them. Our results here are similar to the observations to the ones presented in the paper. Please take a look. We also have additional figures from this simulation and would be happy to send an anonymous link with the figures, if required.
More broadly, the model in the paper has multiple components (the models of bias, assignment, and preferences), where each component follows existing works in algorithmic fairness and/or centralized assignment but the reviewer seems to have reduced the model studied in the paper to just the model of bias.
For the bias model, we consider the most-well established model proposed by [39] (and used by several other works, e.g., [19, 20, 51, 52]) and decided to prove theoretical results for it. However, we do verify the robustness of our results with respect to different models and have also extended the simulations to check robustness with respect to the model the reviewer suggested.
**B) Concerns regarding the re-writing:** We find the comment of Reviewer F4rj's that the “writing needs to be completely revised” unjustified. Besides fixing typos, the changes we promise in our response to Reviewer F4rj are as follows:
- Add a proof overview (which we already composed in our response)
- Update theorem statements to state results that hold with high probability (which are already proved in the appendix), and
- Add two remarks on (i) improving dependence on parameters and (ii) extending one theorem to other distributions.
- Include the additional empirical results mentioned in the new comment.
These changes would lead to almost no changes in other sections and we believe that can be implemented in a day or two to produce a final version.
## New Response to Reviewer F4rj (Around 4800 characters)
<!-- I thank the authors for their responses. It clarifies some things. Unfortunately, I am not willing to increase my score because I feel the main weaknesses remains:
- The model is limited. (e.g., I saw no response to "what happens if there is noise that depends on the group.")
- The results are not very significant; in particular the upper bounds are very large
- The writing needs to be completely revised. Sure it is possible and I trust the authors can do it but I feel it would be a quite significantly different paper and in my opinion that justifies rejecting the current version. -->
**It would be more meaningful to include differential variance as in, e.g., [26], [29]** and **"what happens if there is noise that depends on the group."**
Thanks for reading our rebuttal and responding. We are sorry we had to omit the response to your comment about the group-dependent noise as we ran out of space responding to your questions. We present a response below:
We extended the simulations presented in the Supplementary Material to consider group-dependent noise. Concretely, now the noise for group $A$ has variance $\sigma_A^2$ and, for group $B$, has variance $\sigma_B^2$. We vary $\sigma_A, \sigma_B \in [0,2]$.
Our main observations are similar to the observations to the ones presented in the paper:
1. Institution-wise constraints always have higher preference-based fairness than group-wise constraints: the difference between the preference-based fairness achieved by the institution-wise constraints and by the group-wise constraints is at least 0.1 and up to 0.8 (please see the tables below); and
2. The utility ratios of institution-wise and group-wise constraints are similar in all simulations (within 0.01 of each other).
We are appending tables that show the differences between the preference-based fairness of institution-wise constraints and group-wise constraints (note that all the entries are positive). We will include this simulation and the tables in the final version.
A) Results when $\mathcal{D}$ is the Pareto distribution (with $\alpha=3$) and the fairness metric is $\mathscr{P}_{\rm top1}$
| | $\sigma_B=0.0$ | $\sigma_B=0.5$ | $\sigma_B=1.0$ | $\sigma_B=1.5$ | $\sigma_B=2.0$ |
|----:|------:|------:|------:|------:|------:|
| $\sigma_A=0$ | 0.707 | 0.58 | 0.372 | 0.123 | 0.221 |
| $\sigma_A=0.5$ | 0.744 | 0.606 | 0.576 | 0.325 | 0.106 |
| $\sigma_A=1$ | 0.757 | 0.672 | 0.660 | 0.537 | 0.353 |
| $\sigma_A=1.5$ | 0.767 | 0.679 | 0.716 | 0.558 | 0.539 |
| $\sigma_A=2$ | 0.808 | 0.706 | 0.663 | 0.654 | 0.68 |
B) Results when $\mathcal{D}$ is the Pareto distribution (with $\alpha=3$) and the fairness metric is $\mathscr{P}_{\rm top5}$
| | $\sigma_B=0.0$ | $\sigma_B=0.5$ | $\sigma_B=1.0$ | $\sigma_B=1.5$ | $\sigma_B=2.0$ |
|----:|------:|------:|------:|------:|------:|
| $\sigma_A=0$ | 0.607 | 0.56 | 0.393 | 0.223 | 0.23 |
| $\sigma_A=0.5$ | 0.67 | 0.616 | 0.466 | 0.339 | 0.197 |
| $\sigma_A=1$ | 0.647 | 0.621 | 0.537 | 0.439 | 0.297 |
| $\sigma_A=1.5$ | 0.661 | 0.604 | 0.569 | 0.452 | 0.358 |
| $\sigma_A=2$ | 0.664 | 0.621 | 0.589 | 0.499 | 0.462 |
C) Results when $\mathcal{D}$ is the normal distribution truncated to the interval $[0,\infty)$ (with $\alpha=3$) and the fairness metric is $\mathscr{P}_{\rm top1}$
| | $\sigma_B=0.0$ | $\sigma_B=0.5$ | $\sigma_B=1.0$ | $\sigma_B=1.5$ | $\sigma_B=2.0$ |
|----:|------:|------:|------:|------:|------:|
| $\sigma_A=0$ | 0.680 | 0.580 | 0.501 | 0.302 | 0.228 |
| $\sigma_A=0.5$ | 0.703 | 0.678 | 0.528 | 0.356 | 0.164 |
| $\sigma_A=1$ | 0.655 | 0.648 | 0.599 | 0.500 | 0.340 |
| $\sigma_A=1.5$ | 0.755 | 0.642 | 0.522 | 0.587 | 0.315 |
| $\sigma_A=2$ | 0.764 | 0.745 | 0.525 | 0.511 | 0.503 |
D) Results when $\mathcal{D}$ is the normal distribution truncated to the interval $[0,\infty)$ (with $\alpha=3$) and the fairness metric is $\mathscr{P}_{\rm top5}$
| | $\sigma_B=0.0$ | $\sigma_B=0.5$ | $\sigma_B=1.0$ | $\sigma_B=1.5$ | $\sigma_B=2.0$ |
|----:|------:|------:|------:|------:|------:|
| $\sigma_A=0$ | 0.520 | 0.512 | 0.435 | 0.333 | 0.339 |
| $\sigma_A=0.5$ | 0.457 | 0.457 | 0.400 | 0.310 | 0.248 |
| $\sigma_A=1$ | 0.456 | 0.414 | 0.362 | 0.300 | 0.243 |
| $\sigma_A=1.5$ | 0.494 | 0.469 | 0.371 | 0.299 | 0.235 |
| $\sigma_A=2$ | 0.539 | 0.470 | 0.406 | 0.345 | 0.268 |
<!-- We are attaching a one-page PDF file containing a plot of the preference-based fairness of institution-wise and group-wise constraints in this simulation. -->
<span style='color:blue'>AM: The corresponding figure is available here: https://www.dropbox.com/scl/fo/gu6yko1sie9l4c5dpjfhq/h?rlkey=7m63pkqjduce3t29atxiafy6r&dl=0</span>
## Response to the AC (well within 6000 characters)
Thank you for your time and effort in engaging with the reviewers and considering our rebuttal.
We take the feedback of reviewers seriously and have addressed their specific questions and concerns. In particular, based on the request of Wcsd, we have included a detailed discussion on the relation of our work to mechanism design and centralized assignment with distributional constraints.
As a whole, we found the reviews useful, however, there were a couple of issues that we would like to bring to your attention. It would be great if you could consider these in the subsequent review process.
* aVLs: The score of this review seems inconsistent with their review. Their review is enthusiastic and they give 3 for contributions, 4 for presentation, and 3 for soundness, but the overall score is 4. We have addressed all of their questions and concerns. In particular, their concern about the use of the i.i.d. assumption on the true utilities in Theorem 4.2. We would like to clarify that this i.i.d. assumption is not specific to this work and also applies to the results of the prior works [39, 19, 26, 20, 52] for the special case of subset selection (where there is only one institution and, hence, the preferences of the agents are vacuous). The results obtained under the i.i.d. assumption are useful to inform policy and extending these prior works to the non i.i.d. setting is an interesting technical direction. Moreover, we explain in our rebuttal why one cannot assume the knowledge of $\beta$ in the algorithms. In short, that the solution unusable; this is also rooted in prior work on mitigating implicit biases.
* F4rj: We found the assessment of the contributions of this review a bit unfair. Much of their criticism also applies to prior work (on which the current paper make significant progress). E.g., they criticize Theorem 4.1 for being specific to uniform distributions. However, this is a lower bound result and, hence, fixing any distribution suffices to argue what is not achievable in general. Nevertheless, we answer in our rebuttal how to extend them to other distributions. Moreover, the upper bound result (Theorem 4.2) is stated for a general distribution, yet they give specific settings of parameters to argue against it. Naturally, parameters can be tightened for specific distributions. They also find the i.i.d. assumption on utilities a limitation when prior works on this bias model rely on this assumption. The paper shows empirical results with other distributions such as Gaussian and Pareto and we would be happy to include more theoretical discussion on Gaussian and Pareto in the final version. Finally, they seem to have misunderstood that the results hold only in expectation and not with high probability, an important distinction which considerably strengthens the results.
## Response to Reviewer aVLs (well within 6000 characters)
Thank you for your feedback – we are happy that you found the paper well-written, and appreciate the problem and the solution presented in our paper. We address your questions below and hope that you will strengthen your support for the paper.
**“negative results….On the other hand, the positive results…”** We apologize for the confusion, while the negative result (Theorem 4.1) applies to algorithms that dismiss the beta factor, both the negative result (Theorem 4.1) and the positive result (Theorem 4.2) are benchmarked in terms of the expected utility ratio ($\mathscr{U}$) and the algorithms in both results only have access to the estimated utilities ex-post ($\hat{u}_1, \hat{u}_2, …, \hat{u}_n$) (Line 268 and Algorithm 2 respectively). We will add a clarification in the final version.
**“positive results…depend on the assumption of true utilities being i.i.d.” ... “i.i.d…true utility assumption”** Yes, our proof of Theorem 4.2 relies on the assumption that the utilities of individuals are independently and identically distributed. This i.i.d. assumption is not specific to our work and also applies to the results of the prior works [39, 19, 26, 20, 52] for the special case of subset selection (where there is only one institution and, hence, the preferences of the agents are vacuous). Extending these prior works in the non i.i.d. setting is already an interesting direction.
That said, our positive results allow the latent utilities of agents to be drawn from the "mixture" of two or more distributions: each agent’s latent utility could be drawn from the distribution $\mathcal{D}_1$ with probability $\frac{1}{2}$ and from distribution $\mathcal{D}_2$ otherwise. We will clarify this in the final version.
**“unrealistic to assume…ignorance of the beta”** Indeed, if we have the observed utility of all the candidates, and if we know the bias parameters, then we can rescale the scores to obtain the latent utilities. However, in some contexts this may not be a robust solution. For instance, consider a setting where an interviewer estimates the utilities of the agents. We could ask the interviewer to report their estimated utilities and later rescale them. However, we may not have an accurate estimate of $\beta$. As the algorithm we consider (Algorithm 2) does not rely on $\beta$, it can be applied in such a setting and would still recover the optimal utility.
Furthermore, if the interviewer is explicitly biased, they can give arbitrarily low scores to one group of candidates; this would make any attempt to learn $\beta$ and rescale utilities insufficient.
By, instead, requiring representation based constraints, the interventions we consider are also robust against other forms of bias, and this is why have been used in prior works [39, 19, 26, 20, 52] and in practice, e.g., representational constraints college admissions in India (Lines 115-117) and the Rooney rule (Lines 291-294). We will add a remark on this is in the final version.
**Does theorem 4.2 hold if we drop i.i.d on true utility assumption?** Please see our response to *“positive results…depend on the assumption of true utilities being i.i.d.”* above.
**“Is there any work on “guessing” beta and reverse engineering?”** To the best of our knowledge, there is no work on guessing $\beta$ and rescaling the estimated utilities. Perhaps this is because guessing $\beta$ and rescaling the estimated utilities can lead to non-robust solutions in contexts where the bias arises due to an explicitly biased individual; please also see our response to your concern above.
## Response to Reviewer F4rj (10 characters under)
Thanks for appreciating the paper. We address your questions below. We hope that you will strengthen your support for the paper.
**“best one can get…”** In some cases, we can improve the dependence on $p!$ and $\eta$. E.g., if the number of unique agent preferences is $d$, then $(3p)!$ can be replaced by $d^3$. The dependence on $\eta^{-2}$ arises because of the Chebyshev Inequality and the union bound to establish a concentration bound on the order statistics of the uniform distribution (e.g., Eqs. 15-17). Improvement on the dependence on $\eta$ would follow from tighter concentration inequalities on the order statistics of the uniform distribution. We will add a discussion.
**“…Theorem 4.1…specific to the uniform distribution…”** The first two steps in the proof of Thm 4.1, which estimate $\mathscr{R}$ and $\mathscr{P}$ respectively, hold for any continuous distribution $\mathcal{D}$ of latent qualities. Hence, the bounds on the fairness metrics $\mathscr{R}$ and $\mathscr{P}$ are not specific to the uniform distribution. The proof of the bound on the utility ratio $\mathscr{U}$ uses a high probability bound on the order statistics of the uniform distribution (Eq. 17). If, for a distribution $\mathcal{D}$, one can establish an analogous high probability bound, then the same proof technique can be generalized to $\mathcal{D}$. For specific distributions, e.g., Gaussian or Pareto, one can obtain such bounds and we will include a remark in the final version.
**“…results…are all in expectation…”** Sorry for the confusion. All of our proofs, first prove a high probability bound on the utility and fairness metrics and then use this to bound the expectation (e.g., Eq. 13). We will clarify this.
**“paper only studies…two groups.”** As in prior works [19], the bias model extends to multiple sensitive groups by considering group-specific bias parameters. Fairness metrics and proof of Thm 4.2 can also be generalized to more than two protected groups. We will include a discussion on it in the final version.
**"proof in appendix…main ideas."**
The proof of Thm 4.1 has 3 steps, we will add them to the main body.
1. *Estimating $\mathscr{R}$:* Let $S$ be the set of agents “selected”. To compute $\mathscr{R}$ it suffices to know $|S\cap A|$ and $|S\cap B|$. We establish high-probability bounds these (Lemma C.1). The proof of Lemma C.1 analyzes an urn model (Lines 860-867) and concentration of hypergeometric distribution (eq. after Line 871). These high-probability bounds are used to estimate $\mathscr{R}$ (Lines 879-883).
2. *Estimating $\mathscr{P}$:* For any ranking $\tau$ of the institutions, let $S_\tau$ be the set of agents with preference $\tau$. To compute $\mathscr{P}$, it suffices to know $|S_\tau \cap S \cap A|$ and $|S_\tau \cap S \cap B|$ for all $\tau$. For each $\tau$, we compute high-probability bounds on these by repeating the computations from Step 1.
3. *Estimating $\mathscr{U}$:* Since algorithm $\mathcal{A}$ maximizes the estimated utility, if it selects $k$ agents from group $G$, then these are the $k$ agents with the highest estimated utilities in $G$. Hence, if we know the estimated utilities of the top $k$ agents in $A$ (resp. $B$) for each $k$ and we know $|S\cap A|$ (resp. $|S\cap B|$), then we can compute $\mathscr{U}(\mathcal{A})$.
We already have estimates of $|S\cap A|$ and $|S\cap B|$ from Step 1.
To bound the estimated utilities of the top $k$ agents, we prove a high probability bound on the order statistics of the uniform distribution (Eq. 17). This bound, in turn, gives the required estimate (e.g., Eq. 19).
**“Gale-Shapley…which one is it here?”** The variant (Algorithm 1 in the Supplementary Material) where, at each iteration, all (unmatched) agents propose to the institution that they most prefer and that has not rejected them yet.
**“…stability of the output?”** Algorithm 2 preserves stability within *each group*. We will explain this in the final version. See also the response to Reviewer Wcsd.
**“…use normal gaussian distributions…”** This is a typo, it should say *truncated* normal distribution whose support is non-negative.
**”…algorithm itself be random…”** Yes, the algorithm in Thm 4.1 can be randomized and we will include it in the expectations. The algorithm in Thm 4.2 is deterministic.
**.. at least ½ trivially…”** Yes, in the example below Thm 4.1, the utility ratio is always bounded away from 0 even if $\beta=0$ and we only intended to give an intuition for this. As shown, the utility ratio is at least $\frac{2}{3}$.
**“result…rely on the fact that preferences are independent of scores”** Yes, Thm 4.2 relies on the assumption that the preferences of agents’ are independent of their scores. Your intuition is also correct that, as $n$ grows, the number of agents from each group assigned to each institution without bias is similar to the number of agents from each group assigned to each institution with bias and constraints. However, to prove a bound on preference-based fairness, we need to prove more: we need to prove that the number of agents assigned to their $\ell$-th preference without bias is similar to the same number with bias and constraints (for each $\ell$). This turns out to be challenging as usually which institution an agent is assigned to can be a complicated function of the preferences of all agents. To overcome this, we show that in our model the set of agents assigned to their $t$-th preference is the same as the set of agents that are assigned at the $t$-th iteration of the Gale-Shapley algorithm we consider (i.e., these agents are rejected in each of the first $t-1$ iterations and not rejected at the $t$-th iteration) (Lines 355-361).
**“…\gamma≃0…”** Sorry, $\gamma$ in Section 5 should be denoted differently from $\gamma$ in Section 4. If we denote the $\gamma$ in Section 5 by $\gamma'$, then the relation between them is $\gamma = e^{\gamma'}$. Thus, since $\gamma'\geq 0$, $\gamma\geq 1$.
## Response to Reviewer pA9c (well within 6000 characters)
Thanks for appreciating the theoretical and empirical analysis. We have answered your question below. We hope you will further strengthen your support for the paper.
**“the setting is $\hat{u}_i≤u_i$. Why this is natural?”** The motivation to consider $\hat{u}_i≤u_i$ comes from contexts where the estimated utilities $\hat{u}_i$ of agents in certain (disadvantaged) groups are systematic underestimates of their true utility $u_i$. Such systematic underestimation of utilities, for instance, has been observed in standardized test results for minoritized and low-income candidates [25], and in the evaluation of women in peer review [73] and at work [54, 48] (Lines 41-45).
That said, even in the considered model, one can allow for $\hat{u}_i \geq u_i$ by setting $\beta \geq 1$. Our main result (Theorem 4.2) continues to hold when $\beta \geq 1$. In fact, given any monotonic function $f$, Theorem 4.2 also holds if $\hat{u}_i = f(u_i)$ for each $i$ (Line 362-364). This allows $\hat{u}_i$ to be either larger or smaller than $u_i$, e.g., for $f(x):=\sqrt{x}$ it holds that $\hat{u}_i > u_i$ for any $u_i\in (0,1)$ and $\hat{u}_i < u_i$ for $u_i > 1$.
Regarding the determinism of $\hat{u}_i$, we inherit the assumption that the estimated utility $\hat{u}_i$ is a deterministic function of $u_i$ from prior works on selection with bias [19, 39]. This assumption is unrealistic as there is bound to be some additional noise in real-world evaluations. Considering noise in the estimated utilities, in addition to bias, is an important direction and we explore it in our simulations Lines 127-128 and Supplementary Material D.4.
## Response to Reviewer Wcsd (0 characters under)
We are happy that you appreciate the characterization of the problem. Thanks for pointing us to the literature on mechanism design and centralized assignment with distributional constraints. Below we answer your questions regarding this connection and will include these details in the final version. We hope that you will further strengthen your support for the paper.
**“…it's unclear how this work relates…”** Both–the works on distributional constraints in matching and this work–consider the task of computing a centralized assignment between two sets of agents (denoting, e.g., individuals, institutions, or items).
*Difference in objective considered: Preference satisfaction and utility maximization.* Works on distributional constraints in matching consider the formulation where agents on both sides have preferences, whereas we consider the problem where agents on one side (denoting individuals) have individual preferences and the agents on the other side (denoting institutions) have certain (common) utilities for the individuals and their goal is utility maximization.
Our motivation comes from centralized admission contexts such as the admissions administered by the Joint Seat Allocation Authority in India where institutions share a common utility, e.g., scores in a standardized exam, and the goal of a centralized body is to match individuals to institutions in order to maximize the (total) utility of the institutions while respecting the preferences of the individuals as far as possible (Lines 22-25).
*Difference in types of constraints: group-specific or group-agnostic.* The constraints considered by works in matching and mechanism design are similar to the ones considered in this work: the constraints are lower bounds or upper bounds on the number of individuals that can be matched to one or a set of institutions. The key difference is that the constraints in these works apply to all individuals, whereas we consider different constraints for individuals in different socially-salient groups (in order to counter bias against certain groups).
The motivation to consider these constraints also comes from the aforementioned contexts. As a concrete example, the institution-wise constraints we consider are already in use in the admissions administered by the Joint Seat Allocation Authority in India (Lines 311-314).
*Biases in utilities.* Finally, to the best of our knowledge, unlike this work, works on distributional constraints in matchings do not consider biases in the preferences and/or utilities.
**“…without regard for these other properties?…”** Since the proposed algorithm invokes the Gale-Shapley algorithm group-wise, it inherits (relaxations of) several desirable properties of the Gale-Shapley algorithm.
Concretely, the following holds in our case:
1. *Group-wise Stability.* The proposed algorithm preserves stability within *each group*: for each group $G$, there is no pair of agent $A\in G$ and institution $I$, such that $A$ prefers to be matched to $I$ over their current match and $I$ prefers to be matched to $A$ over one of the agents from $G$ matched to $I$. This group-wise definition of stability adds to the relaxations of stability that have been considered by works on matching with distributional constraints (e.g., (Kamada and Kojima, 2015) and (Kamada and Kojima, 2017)).
2. *Strategy proofness.* Assuming agents' protected groups are exogenous, the proposed algorithm is strategyproof for the agents: agents have no incentive to misreport their preferences (Lines 122-123).
3. *Pareto-Efficiency.* The assignment output by our algorithm is Pareto-Efficient among all assignments that satisfy the institution-wise constraints.
*Relation between group-wise stability and the usual definition of stability.* The above group-wise notion of stability can be seen as a relaxation of the standard notion of stability. The standard notion of stability requires that there there is no pair of edges $E_1:=(A_1,I_1)$ and $E_2:=(A_2,I_2)$ in the matching $M$ such that $A_1$ prefers to be matched to $I_2$ over $I_1$ and $I_2$ prefers to be matched to $A_1$ over $A_2$. Here, $A_1$ and $I_2$ are said to be a blocked pair.
Given the specific fairness constraints used, group-wise stability can be shown to be equivalent to (standard) stability with the *additional requirement* that the matching constructed after “matching” the blocked pair, namely $M\backslash{}\{E_1, E_2\}\cup\{(A_1,I_2),(A_2,I_1)\}$, satisfies the institution-wise fairness constraints.
Y. Kamada and F. Kojima. "Efficient matching under distributional constraints: Theory and applications." American Economic Review, (2015)
Y. Kamada and F. Kojima. "Stability concepts in matching under distributional constraints." Journal of Economic Theory, 2017
**“…What are other reasonable fairness metrics…and how does the proposed algorithm fare under those…”** We consider a family of fairness metrics (parameterized by $w$) that aim to capture a proportional representation objective: a proportional number of individuals from each group receive a desirable outcome. A related objective–equal representation–requires an equal number of individuals from different groups to receive the desirable outcome. The two are at odds when the groups have different sizes. Since our algorithm “satisfies” proportional representation, it cannot satisfy equal representation (except in the special case where groups have equal size).
There are numerous other notions of fairness as well, however, they are known to be at odds with each other (Kleinberg-Mullainathan-Raghavan'17) and, hence, we cannot hope to design one algorithm that satisfies all, or even multiple, notions of fairness. The choice to use proportional representation is grounded in prior work and the fact that it is a common and useful notion of fairness in many contexts.
J. Kleinberg, S. Mullainathan, M. Raghavan. "Inherent tradeoffs in the fair determination of risk scores" ITCS 2017
---
## <span style="color:red">Copy of the older response to</span> Reviewer F4rj
Thanks for your feedback and appreciating the paper. We address your questions and concerns below. We hope that you will strengthen your support for the paper.
**“If these bounds are the best one can get…”** In some cases, we can improve the dependence of the theoretical bounds on $p!$ and we believe the dependence on $\eta$ can be improved by proving a tighter concentration inequality.
Concretely, if the number of unique agent preferences is $d$, then $(3p)!$ in the results can be replaced by $d^3$. This improves the dependence when $d<(p!)$.
The dependence on $\eta^{-2}$ arises because we use the Chebyshev Inequality and the union bound to establish a concentration bound on the order statistics of the uniform distribution (e.g., Equations 15-17 in the proof of Theorem 4.1 and Equations 30-32 in the proof of Theorem 4.2). We do not believe this concentration inequality is tight. Proving tighter concentration inequalities on the order statistics of the uniform distribution would improve the dependence of the theoretical bounds on $\eta$. We will add a remark on the tightness of our bounds in the final version.
**“...Theorem 4.1 is…specific to the uniform distribution of latent qualities…”** The first two steps in the proof of Theorem 4.1, which estimate $\mathscr{R}$ and $\mathscr{P}$ respectively, hold for any continuous distribution $\mathcal{D}$ of latent qualities (and not just the uniform distribution). Hence, the bounds on the fairness metrics $\mathscr{R}$ and $\mathscr{P}$ are not specific to the uniform distribution. Please see the proof overview below for a high-level description of the results.
The proof of the bound on the utility ratio $\mathscr{U}$ is specific to the uniform distribution: the dependence due to the high probability bound on the order statistics of the uniform distribution (Equation (17)).
If, for a continuous distribution $\mathcal{D}$, one can establish an analogous high probability bound, then the same proof technique can be used to generalize the bound on $\mathscr{U}$ to $\mathcal{D}$. While we do not hope to establish a concentration bound for an arbitrary continuous distribution $\mathcal{D}$, it is possible to do so for continuous distributions, e.g., Gaussian or Pareto distributions with bounded parameters. Concretely, if the CDF of $\mathcal{D}$ is sufficiently “smooth,” then we believe that our high-probability bound for the order statistics of the uniform distribution can be converted to high-probability bound for the order statistics of $\mathcal{D}$ by using the inverse CDF method.
We will include a discussion of this generalization in the final version.
**“...results…are all in expectation…”** Sorry for the confusion. All of our proofs, first prove a high probability bound on the utility and fairness metrics and then use this high-probability bounds to bound the expectation. For simplicity, following prior works on subset selection [39, 19, 26, 20, 52], we only state the bounds on the expectation. We will state high-probability versions of our result in the final version.
**“paper only studies…two groups.”** Thank you for pointing this out. Indeed, we limit our discussion and simulations to binary-sensitive groups. However, the bias model we consider can be extended to multiple sensitive groups by considering a group-specific bias parameter. Moreover, the definitions of the fairness metrics and the proof of Theorem 4.2 can also be generalized to more than two protected groups and will include a discussion on it in the final version.
NV: Remove below
That said, related works that study subset selection in the presence of biased densities (e.g., [13, 26, 39]) also limit their study to binary-sensitive groups, and similar extensions (with group-specific parameters) have also been proposed for them (e.g., [19]).
**"proof in appendix...was not able to grasp the main ideas."** Sorry for the difficulty in understanding the main ideas of the proof of Theorem 4.1. We provide a short overview below and will also include an overview in the main body of the final version.
The proof of Theorem 4.1 is divided into three steps that compute bounds on $\mathscr{R}$, $\mathscr{P}$, and $\mathscr{U}$ respectively.
*(Step 1: Estimating $\mathscr{R}$):* Let $S$ be the set of agents “selected” (i.e., matched to an institution). Then to compute $\mathscr{R}$ it suffices to know $|S\cap A|$ and $|S\cap B|$. We establish high-probability bounds on $|S\cap A|$ and $|S\cap B|$ in Lemma C.1. To do so, the proof of Lemma C.1 analyzes an underlying urn model (Lines 860-867) and uses concentration properties of hypergeometric random variables (equation after Line 871). We convert the high-probability bounds on $|S\cap A|$ and $|S\cap B|$ to an estimate of $\mathscr{R}$ by using the fact that the denominator in the definition of $\mathscr{R}$ is bounded away from 0 (Lines 879-883). (This is where we use the assumption that $|A|, |B|, k_1, k_2,\dots,k_p\geq \eta n$.)
*(Step 2: Estimating $\mathscr{P}$):* For any ranking $\tau$ of the institutions, let $S_\tau$ be the set of all agents $a$ such that $\sigma_a=\tau$. To compute $\mathscr{P}$, it suffices to know $|S_\tau \cap S \cap A|$ and $|S_\tau \cap S \cap B|$ for all preferences $\tau$. For each $\tau$, we compute high-probability bounds on $|S_\tau \cap S\cap A|$ and $|S_\tau\cap S \cap B|$ by repeating the computations from Step 1 for the set of agents in $S_\tau$.
We note that all the computations in the first two steps apply to any continuous distribution $\mathcal{D}$ and, hence, the bounds on $\mathscr{R}$ and $\mathscr{P}$ in Theorem 4.1 hold for any continuous distribution $\mathcal{D}$ (and not just for the uniform distribution).
*(Step 3: Estimating $\mathscr{U}$):* Since the algorithm $\mathcal{A}$ maximizes the total estimated utility, one can show that if $\mathcal{A}$ selects $k$ agents from group $G$, then these are the $k$ agents with the highest estimated utilities in $G$. Hence, if we know the estimated utilities of the top $k$ agents in $A$ (respectively $B$) for each $k$ and we know $|S\cap A|$ (respectively $|S\cap B|$), then we can compute $\mathscr{U}(\mathcal{A})$.
We already have a high probability estimate of $|S\cap A|$ and $|S\cap B|$ from Step 1 (Lemma C.1).
It remains to bound the estimated utilities of the top $k$ agents. To do so, we prove a high probability bound on the order statistics of the uniform distribution (Equation (17)). Since the estimated utilities of agents are uniformly distributed, this bound, in turn, gives the required estimate (e.g., Equation (19)).
**“...good to more clearly state…”** Thanks for the suggestion, we will state this more clearly.
**“several Gale-Shapley algorithms…which one is it here?”** We use the variant of Gale-Shapley where, at each iteration, all (unmatched) agents propose to the institution that they most prefer and that has not rejected them yet. Institutions, on the other hand, retain the top $C$ agents that propose to it (where $C$ is the institution's capacity) and reject the remaining agents (if any). This algorithm is formally stated as Algorithm 1 in the Supplementary Material.
**“Does it preserve the stability of the output?”** The proposed algorithm preserves stability within *each group*. Concretely, for each group $G$, there is no pair of agent $A\in G$ and institution $I$, such that $A$ prefers to be matched to $I$ over their current match and $I$ prefers to be matched to $A$ over one of the agents from $G$ matched to $I$. We will explain this in the final version.
**“...experiments…use normal gaussian distributions…”** Sorry about the confusion, the experiments use the truncated normal distribution whose support is non-negative. We will clarify this in the final version.
**”...Can the algorithm itself be random…”** Thanks for pointing our this issue. The algorithm in Theorem 4.1 can be randomized and we will include it in the expectations. The algorithm in Theorem 4.2 is deterministic.
**utility ratio would be at least ½ trivially…”** Yes, in the example below Theorem 1, the utility ratio is always bounded away from 0 even if $\beta=0$ and we only intended to give an intuition for this and establish a stronger bound than $\frac{1}{2}$.
Concretely, the utility ratio can be shown to be at least $\frac{2}{3}$. To see this, note that in the example below Theorem 4.1, the expected latent utility is at least $\frac{1}{2}\cdot (k_1+k_2)$ and optimal expected latent utility is $\frac{3}{4}\cdot (k_1+k_2)$ (the latter is achieved by selecting the top $\frac{1}{2}\cdot (k_1+k_2)$ candidates from group A and B respectively). Assuming that the latent utility is concentrated around its expected value, this implies that the utility ratio is at least $\frac{2}{3}=\frac{1/2}{3/4}$.
NV: Remove below
**“...index $\ell$ is overloaded…”** Thanks for pointing out this issue. We will correct this in the final version.
**“result…rely on the fact that preferences are independent of scores”** Yes, Theorem 4.2 relies on the assumption that the preferences of agents’ are independent of their scores.
Your intuition is correct that, as $n$ grows, the number of agents from each group assigned to each institution without bias is similar to the number of agents from each group assigned to each institution with bias and constraints.
However, to prove a bound on preference-based fairness, we need to prove more: we need to prove that the number of agents assigned to their $\ell$-th preference without bias is similar to the same number with bias and constraints (for each $\ell$).
The bulk of the proof is devoted to formally establishing this for any pair of distributions of preferences and utilities. This turns out to be challenging as usually which institution an agent is assigned to can be a complicated function of the preferences of all agents. To overcome this, we show that in our model the set of agents assigned to their $t$-th preference is the same as the set of agents that are assigned at the $t$-th iteration of the Gale-Shapley algorithm we consider (i.e., these agents are rejected in each of the first $t-1$ iterations and not rejected at the $t$-th iteration) (Lines 355-361).
**“...paper discusses \gamma≃0…”** Sorry for the confusion. The $\gamma$ in the empirical results section (Section 5), say $\gamma_{\textrm{experiments}}$, is different from the $\gamma$ in theoretical results. The relation between them is: $\gamma = e^{\gamma_{\textrm{experiments}}}$. In the simulations, $\gamma_{\textrm{experiments}}\geq 0$ and, hence, the corresponding $\gamma$ is at least 1. We will update the notation and explain the connection between the two quantities in the final version.