Fairness and Algorithms
This page contains a quick glimpse into various notions of fairness for binary classification.
-
Though I'm not making this argument!
Under Construction
This page is still under construction. In particular, nothing here is final while this sign still remains here.
A Request
I know I am biased in favor of references that appear in the computer science literature. If you think I am missing a relevant reference (outside or even within CS), please email it to me.
COMPAS reminder
As reminder, we will copy and paste the earlier text, where we talked about issues related to bias and fairness as it relates to the use of COMPAS tool in Broward county, FL.
In 2016, ProPublica published an article titled Machine Bias , which studied a software called COMPAS that was used to predict recidivism . It showed a bias against black defendants when compared to white defendants:
Both the article as well as the accompanying article have more details on how they came to the above conclusion. In short, here is what happened. Broward County, Florida was using COMPAS scores in pre-trial release decisions. ProPublica got the COMPAS score from the county via a public record request-- these scores predict the likelihood of a defendant committing a crime in the future. Then via public criminal records from the county , ProPublica figured out which of the defendants actually went on to commit a crime in the next two years and compared those with the COMPAS scores and did a bunch of statistical analyses (see the accompanying article for more details).
The ProPublica article is considered a landmark article especially within folks who study algorithms and its interactions with society. Further, the COMPAS dataset created by ProPublica has been used in many analyses and courses (and indeed, we will also use the dataset). I encourage y'all to read the above articles just to get a sense of how the ProPublica investigation was done.
A Rejoinder
There has been some push-back against the ProPublica analysis. In particular, the work of Flores et al. showed that the COMPAS tool is fair in the sense that it is well "calibrated" (which is also a natural notion of fairness).
The rejoinder above might seem contradictory to the ProPublica investigation, but we will now come back to these diverging analyses and talking about fairness in algorithms in general. In particular, we will show "both" of the analyses are correct and further, such contradictory conclusions are inevitable in a very strong mathematical sense.
What is a fair outcome?
In these notes, we are interested in when a binary classifier is "fair." On hearing this, the first question that will pop in your head is:
What is fair?
Our goal is to figure out when we should say that the output of a binary classifier is fair. But before we go down that rabbit-hole, let us step back and consider what fairness means in genera.
It turns out that kids develop a notion of fairness pretty early . Here is one example of kids talking about what is fair vs. what is not:
The inherent notion above is that two kids who are "similar" should be treated in a similar way. In the context of a binary classifier, this very roughly corresponds to the binary classifier treating humans (or more generally datapoints) that are "similar" in a "similar" way. This is the notion of individual fairness, which we will come back to later in the notes.
Here is a related notion of fairness where the idea is that resources should be shared equally. First a sesame street video:
and then a video about kids dividing up a resource equally:
In the context of binary classifiers, this very roughly corresponds to its outcome being similar across people. In particular, we will consider a coarser notion where we would like the outcome of the binary classifier being similar across groups of people (or datapoints that share some characteristics). This is the notion of group fairness, which is the notion we will start off with in these notes. And indeed, this is the notion of fairness that is at the heart of the COMPAS story above.
Our coverage will be very limited
Fairness in ML is perhaps the most well-studied aspect of the interaction of ML and society that we will cover in this course and we certainly will not be able to do it justice.
Book (in progress) on fairness and ML
If you are interested in more details, we refer you to the book by Barocas, Hardt and Narayanan on this topic. As of the time of writing of these notes, this book is still not complete but it does contain a lot of the known material that we will not cover in this course.
Will we get the definition of fairness?
In case you stumbled by here looking for the the definition of fairness: unfortunately you will be disappointed. In these notes we will consider some definitions of fairness that seem natural and consider some of their properties. If you are interested in more (in addition to the book above), let us refer you to this talk by Arvind Narayanan on 21(!) definitions of fairness:
What is bias?
Another loaded term that we will use is the term bias. In particular, there are roughly three kinds of notions of bias that is relevant to these notes:
- The first notion (which might be the least known) occurs in a dataset where there are certain specific collection of input variable values occur more than others. This essentially measure how far away from a truly random dataset the given dataset is. Note that this notion is bias is necessary for ML to work. If all the datapoints are completely random (i.e. both their input and target variable values are completely random), then there is no bias for a classifier to "exploit"-- in other words, one might as well just output a random label for prediction.
- The second notion of bias is that of statistical bias , where in our setting this would mean that the binary classifier outcome does not reflect the distribution of the underlying target variable. Such a classifier would be well calibrated , if this does not happen. One could consider a well-calibrated binary classifier to be fair in some sense. This will be one notion of fairness that will come up in the COMPAS story. (This is the notion of fairness used in the rejoinder to the ProPublica article).
- The finally notion of bias is the colloquial use of the term that is mean to denote an outcome that is not fair. Most of the definitions of fairness in the literature deal with this notion of bias. And a couple of definition of this kind of fairness will also play a part in the COMPAS story (this is the notion of fairness used in the ProPublica article).
Digression: Going beyond the accuracy measure
So far we have really only seen one notion of accuracy of a binary classifier: the number of misclassified points. Before we delve into more formal definitions of fairness that are relevant to the COMPAS story, it turns out that we will need some more nuanced notions of accuracy, which have names like false positive rate and false negative rate . In this section, we will define these terms and get familiar with them.
Definitions independent of fairness
We would like to stress that these notions of rates are independent of notions of fairness. We will define them in terms of the output of a binary classifier and in the next section we will use them to define some concrete notions of fairness. For now, you should just think of them as different notions of accuracy.
Some setup
We start with three notations that we will use in these notes:
Some notation and figures
We will use three functions that we will use to denote various aspects of a given datapoint $x$:
- We will use $Y(x)$ to denote the true label of point $x$ (as before, we will assume these labels are negative (denoted by $Y(x)=-1$) and positive (denoted by $Y(x)=1$). Thus, $Y$ denotes the
ground truth
. Note that this is the function that we do not have access to and are trying to predict using a binary classifier. Pictorially this can be represented as:These are not two-dimensional points
A word of caution-- even though we are representing the datapoints in a 2D rectangle, we are not assuming that the data points are on two input variables. We are embedding the data points in two dimensions since it'll make the followup figures easier to parse.
- We will use $S(x)$ to denote the output of the binary classifier on the datapoint $x$. By definition, recall that a binary classifier will label a point as negative ($S(x)=-1$) or positive ($S(x)=1$). We used $S$ to parallel the use of "scores" of the COMPAS tool. Recall that the classifier divides up the space into two parts: yellow (for negative points) and blue (for positive points):
- We will only need this definition for later but let us put it up here so that all definitions are in one place. We will use $R(x)$ to denote a "sensitive" attribute of the datapoints. Since we will focus on the COMPAS setup for this, we will assume that this sensitive attribute is
race
and every datapoint has a race as black ($R(x)=b$) or while ($R(x)=w$).
No use of probability
Typically, one uses probability to define the various notions of rate. However, it turns out that for our purposes we do not have to use probability: instead we will just use definitions from first principles. In particular, our definitions might not be mathematically precise but they should be enough for the followup discussion to make sense.
Confusion matrix and various notions of rate
The confusion matrix arises if we combine the definitions of $Y$ and $S$ into one "matrix" or "rectangle." In particular, if we combine the figures corresponding to the definitions of $Y$ and $S$, we get something like this:
We get a matrix by thinking of the two "rows" to correspond to the datapoints that are labeled positive and negative in the ground truth (or equivalently by $Y$) and the columns correspond to the datapoints that are labeled positive and negative by the classifier (or equivalently by $S$). Each of the four "entries" or "boxes" gives rise to four notions of rates, which we will define next. However, before we define the rates, we would like to note that the figure above has points distributed uniformly but that of course need not happen in general. For example, it could look something like this (or something else altogether):
True Positive Rate (TPR)
The true positive rate (or TPR) of a binary classifier is the ratio of the number of points it correctly identifies as positive (i.e. the number of points with $Y(x)=S(x)=1$) to the number of points that are labeled as positive in the ground truth (i.e. the number of points $x$ with $Y(x)=1$). Looking back at the visual representation of the confusion matrix, here is a pictorial definition of TPR:
In the above and similar figures below, each of the boxes represent the number of points that are presented by the pattern in the box (if there is no color then it includes both points with the pattern irrespective of the labeling from the classifier).
True Negative Rate (TNR)
The true negative rate (or TNR) of a binary classifier is the ratio of the number of points it correctly identifies as negative (i.e. the number of points with $Y(x)=S(x)=-1$) to the number of points that are labeled as negative in the ground truth (i.e. the number of points $x$ with $Y(x)=-1$). Looking back at the visual representation of the confusion matrix, here is a pictorial definition of TNR:
False Positive Rate (FPR)
The false positive rate (or FPR) of a binary classifier is the ratio of the number of points it incorrectly identifies as positive (i.e. the number of points with $Y(x)=1$ and $S(x)=-1$) to the number of points that are labeled as negative in the ground truth (i.e. the number of points $x$ with $Y(x)=-1$). Looking back at the visual representation of the confusion matrix, here is a pictorial definition of FPR:
In the context of COMPAS setup, talking about FPR makes sense because incorrectly identifying a person as likely to reoffend could have huge consequences. Incarceration has well documents effects in the individual being incarcerated, their family and society in general. Hence, from a social perspective one would like to make FPR as small as possible (and perhaps not care as much about, for example, the overall accuracy).
False Negative Rate (FNR)
The false negative rate (or FNR) of a binary classifier is the ratio of the number of points it incorrectly identifies as negative (i.e. the number of points with $Y(x)=-1$ and $S(x)=1$) to the number of points that are labeled as positive in the ground truth (i.e. the number of points $x$ with $Y(x)=1$). Looking back at the visual representation of the confusion matrix, here is a pictorial definition of FNR:
In the context of COMPAS setup, talking about FPR makes sense because incorrectly stating someone is not going to reoffend (when they actually will) could also have potential societal costs. From a typical law enforcement point of view (i.e. if one wants to reduce occurrence of crime), it might make sense to make FPR as small as possible.
Finally, we present our usual notion of accuracy (i.e. number of misclassified points) in terms of the confusion matrix:
Fraction of correctly classified points (Accuracy)
Since we have seen this notion earlier, we will not define it here but first as a matter of notation from now on accuracy would imply the fraction of correctly classified points. Here is a pictorial representation
Exercise
Compute the TPR, TNR, FPR, FNR and Accuracy for the following instance (that we have seen before):
It will be useful to count the number of point in each of the four "boxes":
It is not hard to check that all the rates are $\frac{21}{21+21}=\frac 12$.
Some more fun with the rates
If you have not thought about these various notions of rate, it is helpful to think more about them. Towards that vein, we consider some potential properties of these rates and how they relate to each other. To make y'all think more, we will present them as exercises (do not click on the "See Answer" buttons before you have thought about the answer yourself!). While you think about the questions, it would be useful to keep the confusion matrix in mind:
We first begin by looking at the rates that essentially correspond to the first row in the confusion matrix:
Exercise
Is it true that \[TPR+FNR = 1?\] Here is a pictorial representation of the question:
If so, why? If not why not (and under what circumstances could the equality hold)?
The answer is yes. This is because the sum of the two numerators is the same as the denominator.
Next we look at the rates that essentially correspond to the second row in the confusion matrix:
Exercise
Is it true that \[FPR+TNR = 1?\] Here is a pictorial representation of the question:
If so, why? If not why not (and under what circumstances could the equality hold)?
The answer is yes. This is because the sum of the two numerators is the same as the denominator.
Next we look at the rates that essentially correspond to the "anti-diagonal" elements in the confusion matrix:
Exercise
Is it true that \[FPR+FNR = 1?\] Here is a pictorial representation of the question:
If so, why? If not why not (and under what circumstances could the equality hold)?
The answer is no. The equality only holds if $FPR=TPR$.
Next we look at the rates that essentially correspond to the "-diagonal" elements in the confusion matrix:
Exercise
Is it true that \[TPR+TNR = 1?\] Here is a pictorial representation of the question:
If so, why? If not why not (and under what circumstances could the equality hold)?
The answer is no. The equality only holds if $FPR=TPR$.
Finally, we ask if the accuracy is related to the rates corresponding to the "diagonal" elements in the confusion matrix:
Exercise
Is it true that \[TPR+TNR = 2\cdot \text{Accuracy}?\] Here is a pictorial representation of the question:
If so, why? If not why not (and under what circumstances could the equality hold)?
The answer is no. We'll leave it as a further exercise to figure out when the equality holds!.
One more measure
Before we move back to the COMPAS story, we would like to define another measure (the positive predictive value , which will be useful in our discussion of fairness in the COMPAS context:
Positive Predictive Value (PPV)
The positive predictive value (or PPV) of a binary classifier is the ratio of the number of points it correctly identifies as positive (i.e. the number of points with $Y(x)=S(x)=1$) to the number of points that are labeled as positive by the classifier (i.e. the number of points $x$ with $S(x)=1$). Looking back at the visual representation of the confusion matrix, here is a pictorial definition of PPV:
Exercise
Compute the PPV for the following instance (that we have seen before):
The answer is $\frac {21}{21+21}=\frac 12$.
Back to fairness definitions
We will not consider three definitions of fairness, which are relevant to the our earlier discussion on COMPAS. As mentioned earlier, we will be first considering the notion of group fairness (which is the relevant notion for the COMPAS setup).
Protected/Sensitive attribute
To define group fairness, we have to well, define a group first. Towards this, we will use the notion of a protected attribute
or sensitive attribute
(we will use both terminology interchangeably): this will be a special attribute $R$ (which takes few pre-defined values i.e. is a categorical variable )-- and each choice of the value of $R$ defines a separate group. There is precedence in US law: grouping this way is used in the concept of protected class in US anti-discrimination law-- i.e. one cannot discriminate on the basis of any protected class.
Coming back to the COMPAS example, we will use $R$ to denote the race and for simplicity we will assume the two values $R$ can take are $b$ (for black) and $w$ (for white). While clearly these are not the only racial classifications, the results of ProPublica mentioned earlier focus on these two values of race and hence we concentrate on these two possibilities.
For the rest of the section, we will only consider groups corresponding to $R(x)=b$ and $R(x)=w$ (i.e. groups based on whether race of $x$ is black or white).
It turns out that all the notions of group fairness we consider is a form of statistical parity:
Statistical parity
At a high level we would like the accuracy of binary classifier to be the same across groups. Since in real life false positive positives and false negatives have different costs, various instantiations of statistical parity definitions follow by asking that different notions of accuracy be the same across groups.
Rates for groups
So far we have defined the false/true positive rate, false/true negative rates and the positive predictive values for the whole group truth. However, it is not too hard to see that one can define these notions for each group separately. For example we could "split" the overall ground truth:
We get the $TPR, FPR, TNR, FNR$ and $PPV$ defined for the two groups (which we denote by $TPR_b, FPR_b, TNR_b, FNR_b, PPV_b$ and $TPR_w, FPR_w, TNR_w, FNR_w, PPV_w$ respectively) by considering the groups truth to be the set of points in each groups respectively (i.e. the left "confusion matrix" in the figure above for $R=b$ and the right "confusion matrix" in the figure above for $R=w$). For sake of completeness we explicitly define the terms $FNR_b,FPR_b,PPV_b, FNR_w, FPR_w$ and $PPV_w$ since those will be of interest to the COMPAS setup.
FPRb and FPRw
The false positive rate for both the black group (denoted by $FPR_b$) and the white group (denoted by $FPR_w$) is the ratio of the number of points in that groups a binary classifier incorrectly identifies as positive (i.e. the number of points with $Y(x)=1$ and $S(x)=-1$) to the number of points that are labeled as negative in the ground truth in that same group (i.e. the number of points $x$ with $Y(x)=-1$). Looking back at the visual representation of the confusion matrix, here is a pictorial definition of $FPR_b$ and $FPR_w$:
FNRb and FNRw
The false negative rate for both the black group (or $FNR_b$) and the white group (denoted by $FNR_w$) is the ratio of the number of points in that group a binary classifier incorrectly identifies as negative (i.e. the number of points with $Y(x)=-1$ and $S(x)=1$) to the number of points that are labeled as positive in the ground truth in that same group (i.e. the number of points $x$ with $Y(x)=1$). Looking back at the visual representation of the confusion matrix, here is a pictorial definition of $FNR_b$ and $FNR_w$:
PPVb and PPVw
The positive predictive value for both the black group (denoted by $PPV_b$) and the white group (denoted by $PPV_w$) is the ratio of the number of points a binary classifier correctly identifies as positive (i.e. the number of points with $Y(x)=S(x)=1$) to the number of points that are labeled as positive by the classifier in that same group (i.e. the number of points $x$ with $S(x)=1$). Looking back at the visual representation of the confusion matrix, here is a pictorial definition of $PPV_b$ and $PPV_w$:
Exercise
Compute the $FPR_b, FPR_w, FNR_b, FNR_w, PPV_b$ and $PPV_w$ for the following instance (that we have seen before):
It is not hard to check that all the false positive and false negative rates are $\frac 12$. Further, it can be verified that $PPV_b=\frac 47$ and $PPV_w= \frac 37$.
Back to fairness definitions
We now state three instantiations of statistical parity:
Equal FPR
We say a classifier fair with respect to FPR if \[FPR_b=FPR_w.\]
In the COMPAS context, a classifier is fair with respect to FPR if chances of a black and white defendants begin identified as reoffending when they actually did not end up reoffending are the same. This is one of the notions of fairness that ProPublica used.
Equal FNR
We say a classifier fair with respect to FNR if \[FNR_b=FNR_w.\]
In the COMPAS context, a classifier is fair with respect to FNR if chances of a black and white defendants begin identified as not reoffending when they actually did end up reoffending are the same. This is one of the notions of fairness that ProPublica used.
Well-calibrated
We say a classifier if well-calibrated if \[PPV_b=PPV_w.\]
In the COMPAS context, a classifier is fair (or does not have any statistical bias ) if the chances of a black and white defendant being correctly identified as reoffending given that the classifier identified them as such are the same. This is the notion of fairness used in the rejoinder to the ProPublica article.
Why no Equal TPR and TNR?
You might be wondering why we did not define notions of fairness with respect to TPR and TNR? The answer is that because we already have! Do you see why?
Hint: We claim those follow from fairness with respect to FNR and FPR respectively.
It follows from an earlier exercise that $TPR_b=TPR_w$ if and only if $FNR_b=FNR_w$. Similarly, we have $TNR_b=TNR_w$ if and only if $FPR_b=FPR_w$.
Exercise
For each notion of being fair with respect to FPR, FNR and well-calibrated, decide if it holds for the following instance (that we have seen before):
Since the false positive and false negative rates for both groups are $\frac 12$, the classifier with fair with respect to FPR and FNR. However, since $PPV_b=\frac 47$ and $PPV_w= \frac 37$, the classifier is not well-calibrated.
ProPublica vs. its Rejoinder
First let us recap the notions of fairness used by the ProPublica article and its rejoinder. The ProPublica article used the fairness with respect to FPR and FNR as its notion of fairness while the rejoinder used well-calibrated as its notion of fairness. Here are the values of the corresponding rates take directly from the accompanying article to the original ProPublica article ("Low" and "High" correspond to $S=-1$ and $S=1$ while "Survived" and "Recidivated" correspond to $Y=-1$ and $Y=1$ resp.):
By looking at the table above, it can be seen that they both are right. In particular, the COMPAS classifier is not fair with respect to either FPR (denoted by "FP rate" in the above table) not with respect to FNR (denoted by "FN rate" in the above table). On the other hand, COMPAS classifier seems well-calibrated since the PPV values are essentially same for both groups.
Perhaps COMPAS can be made better?
Acknowledgments
The arguments in this section are based on a 2016 FAT/ML paper by Chouldechova.
One natural followup question to the fact that COMPAS does not satisfy the definitions of fairness proposed by ProPublica but is well-calibrated is the following:
Can we get all three notions of fairness?
Is it possible to build a tool that is better than COMPAS in that it satisfies all three notions of fairness.
Chouldechova argued that this in general is not possible
NO, you can't!
it is impossible for a binary classifier to satisfy all three notions of fairness (i.e. fairness with respect to FPR, FNR and being well-calibrated) unless the fraction of positives to the overall number of points is the same in both groups.
In the COMPAS dataset, the recidivism rate for blacks and whites are $50\%$ and $39\%$ respectively. Hence, the fact that COMPAS could not satisfy all three notions of fairness, is mathematically unavoidable.
The above kind of result is also known as an impossibility theorem
: see e.g this impossibility theorem for voting systems for a more well-known such result.
Digression: How do you measure recidivism
This is a good time to clarify/remind you that the recidivism rates being higher for blacks than whites does not imply that blacks necessarily reoffend at a higher rate the whites. Think about why this could be the case.
Hint: How would you measure whether someone reoffended or not?
The issue is the question mentioned in the hint. There is no way for sure to know whether a person reoffended in a certain time frame or not: e.g. what about the case when someone commits a crime but never gets caught for it? On the other hand, if someone is caught/arrested for committing a crime that can be recorded.
The notion of recidivism in the COMPAS dataset was whether someone was arrested for another crime in a two year period. Thus, while it is true that more blacks than whites were arrested for a reoffense, this does not mean that the same holds for actually committing a repeat reoffense.
In the remainder of the section, we will argue Chouldechova's impossibility theorem. We start with the special case when all the false positive/negatives rates are $\frac 12$ for both groups:
Argument for special case
The way we will argue this result is to assume that fairness with respect to FPR and FNR holds. From there we will derive the fact that the classifier is not well calibrated unless the trivial condition holds.
To simplify our arguments, let us assume that \[FPR_b=FPR_w=FNR_b=FNR_w = \frac 12.\] like so:
Since $FNR_b=FNR_w=\frac 12$ in both the groups, among the people who will reoffend, the same number of people are flagged by the classifier as going to re-offend vs. not (let this number be $x$ for blacks and $y$ for whites), like so:
Since $FPR_b=FPR_w=\frac 12$ in both the groups, among the people who will not reoffend, the same number of people are flagged by the classifier as going to re-offend vs. not (let this number be $z$ for blacks and $u$ for whites), like so:
To summarize, here is the current situation:
Looking at the above annotated confusion matrices, it can be checked that we have \[PPV_b=\frac x{x+z} \text{ and } PPV_w=\frac y{y+u}.\]
On the other hand, the fraction of blacks who reoffend (denoted by $p_b$, where $p$ stands for "prevalence") is given by (where the last equality follows from the value of $PPV_b$ that we just computed above) \[p_b=\frac {x+x}{z+z+x+x}= \frac x{x+z} = PPV_b.\] Similarly, the fraction of whites who reoffend (denoted by $p_w$) is given by (where the last equality follows from the value of $PPV_w$ that we just computed above) \[p_w=\frac {y+y}{y+y+u+u}= \frac y{y+u} = PPV_w.\]
Thus to have $PPV_b=PPV_w$, we must have $p_b-p_w$. This completes the argument for the special case.
Argument for the general case (left as an exercise)
Argue the impossibility theorem for the more general case of \[FPR_b=FPR_w=q \text{ and } FNR_b=FNR_w=p,\] where $p$ and $q$ are arbitrary numbers strictly between $0$ and $1$ (i.e. they need not even be the same let above both be equal to $\frac 12$).
Hint: The situation in this case will look like this (the roles of $x,y,z$ and $u$ are a bit different from the earlier special case and the algebra a slightly more involved-- but nothing too bad!):
Click here to see the argument
By definition, we have \[PPV_b = \frac{(1-p)\cdot x}{(1-p)\cdot x+q\cdot z}~~~ \text{ and } ~~~PPV_w =\frac{(1-p)\cdot y}{(1-p)\cdot y+q\cdot u}.\] We need to show that $PPV_b=PPV_w$ if and only if $\frac x{x+z}=\frac y{y+u}$. This can be done by just re-arranging terms and verifying that this indeed holds. We will instead solve it using a result that turns out to be very useful in such situations (so a math bonus for y'all!). We will first start by stating the result and then use it to prove our desried result (and hence the total length of our argument would be longer than the "just do it!" argument).
A useful result
Let $a,b,c$ and $d$ be numbers such that $0\lt a \lt b$ and $0\lt c\lt d$. Then the following is true \[\frac ab =\frac cd \text{ if and only if } \frac a{b-a}=\frac c{d-c} \text{ and } \frac a{a+b} = \frac c{c+d}.\]
We just prove the first part and leave the argument for the second part as an exercise. Now note that we have $\frac a{b-a}=\frac c{d-c}$ if and only if $a\cdot(d-c)=c\cdot (b-a)$ (here we have used the fact that $d-c\gt 0$ and $b-a\gt 0$). This equality is true if and only if $a\cdot d -a\cdot c=c\cdot b-a\cdot c$, which in turn is true if and only if $a\cdot d = c\cdot b$. Using the facts that $b,d\gt 0$, we have that this equality is true if and only if $\frac ab =\frac cd$, as required.
Using the expression for $PPV_b$ and $PPV_w$ above, we note that we will have $PPV_b=PPV_w$ if and only if \[\frac{(1-p)\cdot x}{(1-p)\cdot x+q\cdot z} = \frac{(1-p)\cdot y}{(1-p)\cdot y+q\cdot u}.\] Using the useful result, the equality above holds if and only if \[\frac{(1-p)\cdot x}{(1-p)\cdot x+q\cdot z - (1-p)\cdot x} = \frac{(1-p)\cdot y}{(1-p)\cdot y+q\cdot u - (1-p)\cdot y}.\] By simple algebra the above equality holds if and only if \[\frac{(1-p)\cdot x}{q\cdot z } = \frac{(1-p)\cdot y}{q\cdot u} .\] Using the fact that $1-p$ and $q$ are both-non-zero, the equality above holds if and only if \[\frac xz =\frac yu.\] Using the useful result one more time, the equality above holds if and only if \[\frac x{x+z}=\frac y{y+u},\] as desired.
Can we get around the impossibility result?
One of the first things one should try when confronted with an impossibility theorem is to try and circumvent it! For example, in our current case:
Relaxing the requirements
The way to circumvent an impossibility theorem is to relax its requirements. Here are two possibilities in the COMPAS setup:
- The first thing to note that when we looked at the PPV values earlier, they were not exactly the same for blacks and whites. So perhaps instead of asking all three rates be exactly equal, perhaps we ask that the only be approximately equal?
- The arguments (esp. the figures we used) seems specific to binary classification. Perhaps we now ask for classification beyond just binary labels.
One could hope that these changes would make the impossibility results go away. Alas, not so:
NO, you still can't!
A 2017 ITCS paper by Kleinberg, Mullainathan and Raghavan show that even with the above two relaxations, we cannot simultaneously satisfy all three notions of (appropriately defined) approximate notions of fairness (even for non-binary classification).
We would like to mention that the impossibility result due to Chouldechova that we just argued was also independently obtained by the paper of Kleinberg, Mullainathan and Raghavan.
Taking stock
We went through a bit of math so perhaps it is good idea to zoom back a bit and figure out, at a high level, what we learned from the impossibility result above. (Though we will then again come back to doing bit more fun with math).
Alternate conclusion from the impossibility result
We will present two equivalent descriptions of the impossibility result:
- If we want all three forms of fairness (fairness with respect to FPR, fairness with respect to FNR and well-calibrated classifier), then that is only possible when the prevalence of the target variable (i.e. the fraction of points $x$ with $Y(X)=1$) is the same across groups.
- Suppose we want to consider fairness with respect to FPR and FNR (we will see shortly why this is a desired outcome). Then if the prevalence of the target variable is not same across groups (i.e. the data itself is biased), the binary classifier has to "correct" for the bias and hence, cannot be well-calibrated.
We note that the first interpretation is the literal interpretation of the impossibility result while the second interpretation is the equivalent contrapositive of the impossibility theorem.
From a practical point of the view what the second interpretation implies is that (given that in real life) data is (almost always) biased, if we want fair outcomes (in the sense of fairness with respect to FPR and FNR), then the classifier has to "correct" for the bias. We will briefly come to this point in a bit.
Why should we care about fairness with respect to FNR and FPR?
The impossibility result above states that we cannot have have a well-calibrated classifier that is also fair with respect to FNR and FPR. This basically means we have to choose one of the two options. The rejoinder to the ProPublica article basically rests on the thesis that one chooses well-calibrated classifier (since e.g. it matches the traditional notion of a classifier not having statistical bias). So the question becomes:
Why should we care about fairness with respect to FPR and FNR?
As alluded to earlier: there are good reasons to ask for fairness with respect to FPR and FNR. In particular, one strong motivation comes from the legal principle of disparate impact . The basic principle is that the outcome of a decision maker should not impact one group under a protected class (e.g. blacks) more than another group in the same protected class (e.g. whites). Thus, if one group is disproportionately impacted by a decision process then that process can be said to be legally discriminatory.
We will see shortly that under a reasonable (but very simplified) model, difference in FPR and FNR lead to disproportionate impact.Acknowledgments
The results (and essentially the arguments) in the rest of this section are based on a 2016 FAT/ML paper by Chouldechova.
The model
Consider the following simple model that abstracts out how a typical application of a tool like COMPAS works out. Say a defendant $x$ is flagged as being high risk
(i.e. $S(x)=1$), then that typically manifest itself as a penalty
(which could be monetary like bail amount or some indirect costs, e.g. cost to family/society of incarceration) amount $t_H$. Similarly when defendant $x$ is flagged as being low risk
, then the corresponding penalty is denoted by $t_L$. We will assume that
\[t_H \gt t_L.\]
In fact, in my many situations we have $t_H\gg t_L$.
Now again consider the two groups (based on whether $R(x)=b$ or $R(x)=w$). Then the disparate impact
of the classifier output $S$ is defined as the difference between these two quantities:
- The average penalty cost for black defendants (i.e. sum up the penalties for all black dependents: i.e. $t_H$ times the number of black defendants $x$ with $S(x)=1$ plus $t_L$ times the number of black defendants $x$ with $S(x)=-1$.
- The average penalty cost for white defendants (i.e. sum up the penalties for all white dependents: i.e. $t_H$ times the number of black defendants $x$ with $S(x)=1$ plus $t_L$ times the number of white defendants $x$ with $S(x)=-1$.
In the model above, Chouldechova proved the following result, which basically says that a difference between FPR (or FNR) translates directly into disparate impact even if we just consider folks who reoffended (or did not reoffend):
Disparate impact theorem
Both of the following statements are true (in the above model):
- (Disparate impact for reoffendors) The disparate impact for defendants who reoffend (i.e. only considering defendants $x$ such that $Y(x)=1$) is exactly given by: \[\left(t_H-t_L\right)\cdot \left(FNR_w-FNR_b\right).\]
- (Disparate impact for non-reoffendors) The disparate impact for defendants who do not reoffend (i.e. only considering defendants $x$ such that $Y(x)=-1$) is exactly given by: \[\left(t_H-t_L\right)\cdot \left(FPR_b-FPR_w\right).\]
Before we argue the result above, let us see what it implies. As we saw earlier, in the COMPAS dataset we have $FNR_w-FNR_b\approx 20\%$ and hence this means that among defendants who reoffend, blacks have a larger penalty on average than whites. One could make the argument1 since the prevalence of reoffending is higher among blacks (see the earlier caveat), then it would follow that blacks would have higher penalty.
Note however that as we saw earlier, in the COMPAS dataset we have $FPR_b-FPR_w\approx 20$ as well and hence among defendants who are not going to reoffend, blacks have a larger penalty on average than whites. This conclusion, hopefully something we can all agree is not good.
Next, we argue the above disparate impact theorem:
Proof Idea of Disparate impact theorem
We will focus on the argument for the first part of the theorem. Consider the following general situation:
Let us first consider the average penalty for black defendants. Note that $v$ black defendants are marked as high risk and these defendants incur a total penalty of $v\cdot t_H$. Further, $y$ black defendants are marked as low risk and these defendants incur a total penalty of $y\cdot t_L$. Thus, overall there are $v+y$ defendants who in total incur a penalty of $v\cdot t_H+y\cdot t_L$. This implies that the average penalty for black defendants is \[\frac{ v\cdot t_H+y\cdot t_L}{v+y} = t_H\cdot\frac v{v+y} + t_L\cdot \frac y{v+y}=t_H\cdot TPR_b + t_L\cdot FNR_b,\] where the last equality follows from definition of $TPR_b$ and $FNR_b$.
Using a similar accounting for white defendants, we get that the average penalty for white defendants is \[\frac{ z\cdot t_H+u\cdot t_L}{z+u} = t_H\cdot\frac z{z+u} + t_L\cdot \frac u{z+u}=t_H\cdot TPR_w + t_L\cdot FNR_w.\]
The above two expressions for the average penalties for black and white defendants along with the definition of disparate impact, implies that in this case the disparate impact is given by \[t_H\cdot TPR_b + t_L\cdot FNR_b -\left(t_H\cdot TPR_w + t_L\cdot FNR_w\right).\] Collecting the terms for $t_H$ and $t_L$ separately, we get that the disparate impact is \[ t_H\left(TPR_b - TPR_w\right) + t_L\left(FNR_b-FNR_w\right).\] Recalling that $TPR+FNR=1$, we have that $TPR_b - TPR_w= 1-FNR_b-\left(1-FNR_w\right) = FNR_w-FNR_b$. Applying this to the above we get that the disparate impact is \[ t_H\left(FNR_w - FNR_b\right) + t_L\left(FNR_b-FNR_w\right) = t_H\left(FNR_w - FNR_b\right) - t_L\left(FNR_w-FNR_b\right) = \left(t_H-t_L\right)\cdot \left(FNR_w - FNR_b\right),\] where the equalizes follows from doing some basic algebraic manipulations.
We will not do the proof of the second part of the disparate impact theorem since it is very similar to the above and leave it as an exercise:
Exercise
Argue the second part of the disparate impact theorem.
Hint: The following figure might be useful:
What if one is happy with a specific fairness definition?
So far we have basically showed that if one wants fairness with respect with FNR nd FPR, then we will not have a well-calibrated binary classifier. However, what if we were OK with this:
How do we incorporate a fairness notion
Here is the technical problem: given that we want a binary classifier such that it has equal (or approximately) close to equal FNR (and equal FPR), can be train a model that has the best accuracy subject to these fairness constraints?
It turns out that one can express these fairness constraints as "linear constraints" and we can use existing techniques from optimization to get a model with the required fairness constraints. Agarwal et al. show this to do this for a broad class of fairness constraints (i.e. even beyond fairness with respect to FNR and FPR) in a fairly generic way.
The above result is a great technical result but while it does seem to give a "definitive" answer and we have "solved" the fairness question, we will come back to potential shortcomings of handling fairness as a technical problem later on in the course.
Beyond group fairness
Acknowledgments
The examples and definitions in this section are based on a 2012 paper by Dwork, Hardt, Pitassi, Reingold and Zemel and a 2018 ICML paper by Kearns, Neel, Roth and Wu.
Shortcomings of group fairness
In this section, we outline two natural examples, where our notion of fairness (in particular, fairness with respect to FPR and FNR) hold but the outcomes are not "fair" by any reasonable standard. The first example shows that "fairness through blindness" can lead to a result that actually is not favorable to the party running the classifier:
Fairness through blindness
This example is take from Dwork et al.
One common notion of fairness used is that of "fairness through blindness"-- the idea here is that the classifier explicitly does not use the sensitive attribute ($R$ in our case is race). In particular, such classifiers explicitly exclude the sensitive attribute as one of the input variables. However, in practice e.g. it turns out that zip code is a very good predictor of race. For the roots of why this is true, see this video:
We now consider the case where talented black students are steered towards engineering and less talented black students are steered towards finance. On the other hand, talented white students are steered towards finance while less talented white students are steered towards engineering. (Note: We are not claiming this is the case but this is a madeup scenario.).
A company that just wants to hire talented students decides to go for fairness through blindness and pick a recruiting strategy that completely ignores membership of a student in either race. In particular, it decides that it will consider the grades of students in economics (since they are a finance company) and choose to hire students based on how well they did in their economics course.
To make things more concrete, we will also assume that a talented student, no matter what course they are taking, are thrice as likely to do well then not. Conversely, a less talented student, no matter what course they are taking are thrice as likely to not do well in the course.
With all these assumptions, the following situation is a possibility:
Since the figure setup has changed a bit from before, the horizontal rows now correspond to whether a student is talented or not. The columns now correspond to whether a student takes an economics course or whether they take an engineering course. (Note that as per the setup more talented black students take an engineering course and more of the less talent black students take an engineering course-- and the distribution is flipped for white students.) A green triangle means that a student does well in whatever course they are taking and a red circle means that they did not do well in the course that they are taking. The actual number of points in each box does not matter-- just the relative proportion matters. (Note that that the fraction of green triangle to red circle points reflect how well a talented or less talented student will do in a course.)
Finally, note that since the company hires based in economics course performance, the green triangles in the economics score part is colored blue and the rest is colored yellow. The engineering part is not colored since the company does not look at students who took an engineering score. In other words, the setup of the binary classifier used by the company looks like this:
It turns out that the classifier above is fair with respect to FPR and FNR since (where the target variable is whether a student is talented or not-- i.e. $Y(X)=1$ implies student $x$ is talented and $Y(x)=-1$ implies student is less talented):
Exercise
Show that in the situation above we have $FNR_b=FNR_w=FPR_b=FPR_w=\frac 14$.
However, if one looks at the overall picture: the outcome is clearly not "right" even from the perspective of how well the company did in its recruiting since it excluded many talented students who took an engineering course (who in this case turned out to be mostly black students) since its hiring policy did not explicitly take race into account.
Fairness gerrymandering
The general idea of this example is from Dwork et al. though the specific version (and indeed the term fairness gerrymandering
) is from Kearns et al.
The basic idea behind this "attack" is that while a classifier's output is fair with respect to FPR and FNR for race and gender individually, they might no longer be fair when we combine race and gender. Before going into the details of an example, we would like to point out that this at a high level is the same issue as that of intersectionality that was coined by Kimberlé Crenshaw . Here is a TED talk by Crenshaw on this (warning: there are some graphic violence scenes towards the end of the video):
Let us now come back to the notion of fairness and consider the following situation:
A quick overview of the figure setup: the green triangles and red circles again denote positive and negative points in the ground truth. The rows divide up the set of points based on whether a point corresponds to a black or white person while the columns break up the points based on whether a point corresponds to a man or a woman. As you will notice the points (and their labels) are very evenly distributed. Now consider a classifier that colors points blue and yellow as mentioned above.
The first thing to note is that FPR and FNR along each axis is the same (i.e. $FNR_{\text{Man}}=FNR_{\text{Woman}}=\frac 12$ and $FNR_{\text{Black}}=FNR_{\text{White}}=\frac 12$ and the same value for FPRs). In other words, for the sensitive attribute being just race or just gender, the classifier is fair with respect to both FNR and FPR.
However, it is clear that the classifier is very unfair to black women (and white men in this example). In particular, note that $FNR$ for black women is $1$ which is not the same as the FNR for white women or black men (in which cases FNR is $0$)-- in a very quantitative sense this is the worst possible unfair outcome.
Individual fairness
Using the above examples (and others), Dwork et al. proposed the notion of individual fairness
, which at a high level is defined as follows:
Individual fairness
We say that a classifier is fair if it treats two "similar" individuals "similarly." Note that this is the first notion of fairness that we started off with in these notes.
The natural followup questions are:
- How do we determine how similar two individuals are?
- How do we define what it means for a classifier to treat two individuals similarly?
It turns out that there is a reasonable answer to the second question above: most of the binary classifiers actually define a distribution on how "likely" a given point is to being labeled positive. We can say that a classifier treats two individuals similarly if the likelihood of them being labeled positive is very close.
Shortcomings of individual fairness
Dwork et al. state that society will need to decide on what it means for two individuals to be similar and once this notion of similarity is well-defined it can be used to answer the first followup question above.
In my personal opinion, the above is a really strong assumption and is a shortcoming of the proposal of individual fairness. The main reason is that soliciting much simpler information from humans is hard-- trying to elicit a "true" distance between individuals for all human beings is not realistic.
The notion of individual fairness get over the shortcomings of group fairness that we talked about-- see Dwork et al. for more details.
Let us recap where we are. First, group fairness is easy to define (and is actually used in the US legal system)-- however, it has some "blind spots" where it deems a scenario to be fair, which reasonable people will contend are not fair. On the other hand, the notion of individual fairness has very strong fairness guarantees but is not practical because it needs access to a very strong notion of distance between any two individuals.
It is natural to ask: Can we get a notion of fairness that "gets the best of both worlds"?
In between individual and group fairness
Kearns et al. define a notion of fairness that potentially holds against (exponentially or even infinitely) many subgroups. The paper then shows how one can compute the optimal model with these fairness constraints. Please see the paper for more details.
Next Up
Next, we will pay closer attention to the interaction of the society with the ML pipeline.