Incorporating multiple notions of fairness

This page contains a quick overview of the proposed project on incorporating multiple notions/definitions of fairness in the context of a system that allocated limited resources.

Under Construction

This page is still under construction. In particular, nothing here is final while this sign still remains here.

A Request

I know I am biased in favor of references that appear in the computer science literature. If you think I am missing a relevant reference (outside or even within CS), please email it to me.

Mentors

Kenny Joseph and Atri.

Background

Existing work

Several recent works have considered how to capture and leverage perceptions of fairness from many individuals. First, Jung et al. begin with the notion of individual fairness of Dwork et al., where the idea is that individuals who are similar should be treated similarly. They then account for multiple perceptions of fairness by allowing multiple definitions of similarity. To do so, they ask human judges whether or not pairs of subjects should be treated similarly by a classifier. They then propose and prove the correctness of an algorithm that uses these pairwise judgments of similarity to inform a linear classifier that requires, within bounds defined by two slack parameters, that individuals defined to be similar are treated similarly by the classifier.

Second, in a more applied vein, Lee et al. propose an approach that obtains a set of moral judgments and then calculates the "societally preferred" response for decisions in the context of food bank donations. To do so, they develop a method to aggregate individual beliefs into a voting-based decision system. Notably, their work involves a variety of stakeholders in the domain, whereas most other works use a convenience sample from Mechanical Turk. However, there is no discussion of the tradeoffs between fairness and accuracy, and no formal proofs of correctness.

Two general concerns can be raised of existing work. First, work in this area often admits as a limitation the potential for fairness annotation bias --- that is, biases in who is providing fairness judgments that impact algorithm decisions. For example, in Grgić-Hlača et al., show that liberals and conservatives have different moral perspectives on procedural fairness, but do not incorporate this into their algorithm. Second, a large gap exists between theoretical and practical approaches. In particular, Jung et al. makes prove rigorous statements on guarantees on accuracy and fairness of their algorithms with minimal assumptions about how data or judgments are structured. On the other hand, the more applied papers do exploit certain structural assumptions but do not provide theoretical guarantees. Thus, theoretical work with rigorous guarantees that also exploits the inherent structure is missing.

Proposed project

High level goal

The very high level goal of this project is to develop an algorithm that efficiently captures different people’s perspectives on what is fair when allocating/distributing a particular resource (e.g. a limited number of food stamps) to individuals in a community.

The expected outcome of this project is a method to efficiently identify multiple perspectives of fairness for a service allocation task. We assume there is a population, $X$, of individuals who are eligible to receive some service $s$. Due to budget constraints, however, only $K_s$ members of $X$ will receive the service. We will solicit fairness annotations from a pool of annotators, $F$, to determine the annotators' perception of the most deserving $K_s$ individuals, under the assumption that annotators with different attributes may have different perceptions of what is fair. For example, $X$ may be people in Buffalo, $s$ is, e.g., a set number of food stamps, and $F$ is a collection of UB undergraduates.

A less high level goal

Fairness annotators will be given multiple annotation tasks. Each task asks, given a limited budget, should person $X_i$ or person $X_j$ receive some service $s$? Our goal is to minimize the number of pairwise comparisons made by annotators while ensuring we faithfully represent their view of the $K_s$ most deserving individuals in $X$ to receive service $s$.

Another line of related work

Before we proceed, we would like to highlight another line of related work is the literature on identifying top-$K$ sets from pairwise comparisons (see e.g. Braverman et al., Chen et al. and Chen and Suh}. In this literature, the general idea is that each individual in $X$ has some utility value $\mu_{x_i}$. The probability that some $x_i$ is selected to receive the service over other individuals in some subset of $X$, say $M$, is defined via a multinomial logit model , i.e. $ p(x_i | M) = \frac{ \exp(\mu_{x_i})}{ \sum_{x_k \in M} \exp(\mu_{x_k})} $. The goal is to identify some minimal set of comparisons (i.e. minimal times we have to sample a new set $M$ and ask annotators to make a service allocation decision) to provably identify the top $K_s$ subjects in terms of their utility values.

The primary research question is, how should we make pairwise queries such that we can identify the $K_s$ most deserving individuals, when there may be multiple perspectives on who is most deserving? You will extend existing approaches for this (e.g. Chen et al.) to the case where annotators have different attributes (e.g. political affiliation, gender) that lead to different perspectives on fairness.

Proposed project-specific deliverables

We first list all the steps that would need to be done to take this project to its logical conclusion:

Read the relevant literature to identify a starting point (ask the mentors for pointers).
Come up with a mathematical model that can be used to efficiently extract a single set of $K_s$ individuals that maximally satisfies all annotators under the assumption that the annotators are drawn from $G$ different groups, each of which has its own perception of what is fair.
Do one of the following:
1. design and implement an experiment to test your model.
2. prove bounds on the sample complexity (number of annotations) needed to identify this set of people under any assumptions made.

What is expected from you in this project

To get full credit on this project you have to do steps 1, 2 and 3(a) OR 3(b).

References

Christopher Jung, Michael Kearns, Seth Neel, Aaron Roth, Logan Stapleton and Zhiwei Steven Wu, Eliciting and Enforcing Subjective Individual Fairness . Manuscript 2019.
Min Kyung Lee, Daniel Kusbit, Anson Kahng, Jitae Kim, Xinran Yuan, Allissa Chan, Daniel See, Ritesh Noothigattu, Siheon Lee and Alexandros Psomas, WeBuildAI: Participatory Framework for Algorithmic Governance . In CSCW 2019. [Author paper copy ]
Nina Grgić-Hlača, Elissa M. Redmiles, Krishna P. Gummadi, Adrian Weller, Human Perceptions of Fairness in Algorithmic Decision Making: A Case Study of Criminal Risk Prediction . In WWW 2018. [More details ]
Mark Braverman, Jieming Mao and Yuval Peres, Sorted Top-$k$ in Rounds . In COLT 2019.
Xi Chen, Yuanzhi Li and Jieming Mao, A nearly instance optimal algorithm for top-$k$ ranking under the multinomial logit model . In SODA 2018. [More details ]
Yuxin Chen and Changho Suh, Spectral MLE: Top-$K$ Rank Aggregation from Pairwise Comparisons . In ICML 2015. [More details ]