Discrimination, Law and ML
This page will do a quick overview of anti-discrimination law and how it could/would interact with the ML pipeline.
-
Needless to say, this is not a legal term!
-
Well, technically it discriminates against folks without smartphones but having a smartphone is not one of the protected classes. So we will go with age, which is correlated with smartphone ownership but it also a protected class at the same time.
-
Here similarly situated means that other than age (and perhaps smartphone ownership-- hint!hint!) this younger person should have the same (or very similar) characteristics as the older person: e.g. where they live and so on.
Under Construction
This page is still under construction. In particular, nothing here is final while this sign still remains here.
A Request
I know I am biased in favor of references that appear in the computer science literature. If you think I am missing a relevant reference (outside or even within CS), please email it to me.
Anti-discrimination law
In this section, we will review anti-discrimination law as part of Title VII of the Civil rights act of 1964 .
Civil Rights Act of 1964
The Civil Rights Act of 1964 is one of the landmark legislation in the US history and was one of the outcomes of the American civil rights movement :
Title VII
In particular, Title VII
bars discrimination by employers based on national origin, race, religion and sex (see the discussion on protected groups for more on these protected groups).
Other anti-discrimination laws
There are of course other anti-discrimination laws, e.g. the Americans with Disabilities Act of 1990 :
We will focus on Title VII in these notes to keep the discussion a bit self-contained (though other anti-discrimination laws at least in spirit are similar).
Before we move on we want to clarify two things: (1) the definition of a protected class, which is what we will use in the rest of the notes and that (2) not all discrimination is illegal.
Protected groups
The following groups are considered to be protected groups for the purposes of employer discrimination laws:
- Race
- Color
- Religion
- Age (40 and over)
- Sex (including pregnancy, sexual orientation or gender identity)
- National Origin
- Disability status
- Genetic information
For the rest of the notes, when we say protected class/group
we mean one of the groups mentioned above.
Not all discrimination is illegal
This would be a good time to remind the reader that not all forms of discrimination are illegal and there are nuances in some of these issues (some of which are not resolved):
As we will see in these notes, even discrimination against protected groups is OK if other conditions are satisfied (we will see some examples where this statement makes a lot of sense).
Law is not set in stone
One thing to keep in mind as we make our way in these notes is that law is not an immutable object-- laws change. Perhaps more importantly, even though a law might have been passed a long time ago does not mean that it's interpretation is all settled. Here is a recent example related to Title VII:
Quick Overview of Title VII
We clearly do not have the time to go through all the details of Title VII provisions (let alone subsequent discussions on it), so we will do a quick (and necessarily incomplete) overview of the two main provisions of title VII:
Disparate treatment vs. disparate impact
An employer sued under Title VII roughly fall into two categories. The first is where the discrimination is intentional, i.e. the employer treats employers/applicants differently based on whether they belong to a protected group or not-- this is disparate treatment
.
The second category is where the discrimination is unintentional but it nonetheless leads to different impact on different groups-- this is disparate impact
.
Disparate Treatment
While this is bit of a simplification, there are two classes of disparate treatment-- one for formal discrimination and another for intentional discrimination.
Two kinds of disparate treatment
- Formal discrimination occurs if membership in one of the protected class is explicitly used as a factor in making employment decisions.
- In order to figure out intentional discrimination (other than a clear-cut formal discrimination), as explained by Barocas and Selbst, there are two frameworks under which a case regarding intentional discrimination will occur:
- The first is based on the McDonnell Douglas Corp. v. Green case . This has three steps:
- The plaintiff (i.e. the employee/applicant) has to show that a similarly situated person as them who is not a member of the protected class would have a different outcome than the plaintiff's.
- Assuming the above step goes through, the defendant (i.e. the employer) has to offer a legitimate non-discriminatory reason for their decision. (An interesting point here is that the employer does not have to argue that their reason is true-- they just have to produce a reason.
- Assuming the above step goes through, it is now the burden of the plaintiff to argue that the reason presented by the employer in the previous step is bogus1.
- The second is based on the Price Waterhouse v. Hopkins case . In this case the plaintiff does not have to show that the employers rationale is bogus but that discrimination was a "motivating" factor. Essentially this means that the plaintiff has to show that the decision of the employer would have different absent the discriminatory notice.
- The first is based on the McDonnell Douglas Corp. v. Green case . This has three steps:
Let us do two exercises to make sure the high level ideas of disparate treatment are clear to you:
Exercises
The first exercise is from Barocas and Selbst. Say your company uses an ML pipeline to make hiring decisions, which explicitly uses race as one of the input variables. You later do a study of the model developed and find that race is the least important input variable for the model to make its prediction. Applicant A sues your company on disparate treatment. Does they have a case?
Yes, in this case since race was used explicitly as an input variable this would constitute a disparate treatment violation (even if there is no discriminatory effect).
The second exercise is adapted from a lesson by Shawn Grimsley . Suppose your neighborhood mosque has an application open for an Imam to oversee the said mosque. Neighbor B sues the mosque since they only interview candidates who are Muslims: i.e. they are explicitly using membership in a protected class (in this case religion) in their employment decision. Does B have a case that they could win?
No because in this case the mosque can make the legitimate case that being a Muslim is a requirement for being an Imam.
Disparate Impact
We begin our discussion of disparate impact by going through a video for the Griggs v. Duke Power Co. , which is generally considered to be the first case on disparate impact:
Next, we will go through a video that gives an overview of disparate impact:
Here is a followup video with more examples:
Three steps in disparate impact case
Below we summarize the three main steps that a case under disparate impact might go through (though as noted in Barocas and Selbst, what is "enough" for each step is not clear and we provide a brief summary of these discussions from Barocas and Selbst):
- A plaintiff (i.e. employee/applicant) must show that the employment decision process adversely affects a certain group in some protected class: i.e. it must show empirically that the outcome of the employment decision process statistically favors one group in a protected class over another.
- Of course this leaves the question of what statistically constitutes an adverse impact. More precisely, by what fraction should a group be affected (as compared to say the "best performing" group)? This has led to the so called four-fifths rule -- i.e. if a person from a certain group is less than $80\%$ likely than another person in another group (in the same protected class) to be accepted by the employment decision process, then an adverse impact on the first group has happened. For example in the Griggs v. Duke Power Co. case, $58\%$ of whites passed the test at Duke Power Co. while only $6\%$ of African Americans passed the test. Since the ratio $\frac 6{58}\approx 10.3\%$ (well below the $80\%$ required by the four-fifths rule.
- If disparate impact is shown, then the defendant (i.e. the employer) must show that the challenged decision process is job related and is required due to business reasons.
- There is some confusion about the difference between "job relatedness" and "business necessity". Also the standards for what constitutes job relatedness/business necessity has changed since the Griggs v. Duke Power Co. case. The current standards were set by the Civil Rights Act of 1991 .
- If business necessity/job relatedness is shown in the second step, the burden then shifts to the plaintiff to show that the employer could have used an "alternative employment practice" with less discriminatory results.
- The requirement for "alternative employment practice" was codified in the Civil Rights Act of 1991 , though there are no set standards for what constitutes an "alternative employment practice."
Title VII and ML
Acknowledgments
This section borrows heavily from/builds on a 2016 California Law Review paper by Barocas and Selbst. Some of the material from the previous section was also taken from this paper.
The previous part of these notes talked about Title VII with regards to discrimination in employment decisions (without specific attention to whether an ML model was used in making the decision or not). We will now focus on how the ML pipeline specifically interacts with Title VII.
Why Title VII and the ML pipeline?
Before we move along and discuss how Title VII would interact with the ML pipeline, one natural question is whether ML has been/is being used for hiring? The answer is a definite yes:
Since Title VII is one of the main tools in law to counter discrimination in hiring, one obvious place to look to counter potential discrimination in hiring using an ML pipeline would be Title VII. This section attempts to do an initial survey of what this interaction would look like.
We first will look at the issue of masking, which is especially relevant when an ML pipeline is involved in making the decisions. Then we will see how the disparate treatment paradigm of Title VII could help counter discrimination stemming from using an ML pipeline. We then will do a similar exercise with disparate impact. Finally, we will conclude with some general thoughts on the interaction between Title VII and discrimination via an ML pipeline.
Assumption for the exercises in this section
To simplify the setup (slightly), we will assume that the party employing the ML pipeline to make an employment decision is the same as the party who developed the ML pipeline. Of course, in real-life, employers can use pipelines developed by third parties but we will not consider the extra complications that can arise in such situations.
Masking
What is masking?
As the name suggests, masking, is an attempt to "hide" an employer's "intent" to discriminate by presenting things in a way that is (potentially) not illegal under Title VII.
As one might guess, this is something that can be done even when the employment decision is made without the involvement of an ML pipeline. See e.g., this article that talks about masking discrimination against age by using certain phrases .
However, the ML pipeline introduces new avenues for masking in employment decisions. Simply put, one could intentionally introduce bias in the ML pipeline in order to mask ones intention of discriminating against certain protected groups.
As suggested above, an ML pipeline designer could intentionally introduce bias into the ML pipeline. In particular, recall the six kinds of bias:
Six kinds of potential bias in the ML pipeline
Exercise
Click here for suggestions for Instructors
If running this as an in-class exercise, breakup the class into number of groups that divides $6$ and assign equal number of biases from above to each group to discuss.
Put yourself in the shoes of an employer who wants to discriminate based on a protected class membership but wants to mask it by intentionally introducing bias into the ML pipeline. Consider how you would introduce biases of the kind above (or other kinds not covered in one of the six classes of bias above) into the ML pipeline so that your final model discriminates based on a protected class.
Before we move along, we collect a "catalog of evils" as mentioned by Dwork et al. (2012):
Catalog of evils
Below we summarize six "evils," which are employment practices that intentionally discriminate. These are not necessarily specific to employment decisions made with an ML pipeline but it is made easier to implement some of these with an ML pipeline. We only give very brief descriptions (see Dwork et al. (2012) for a bit more detailed discussions:
Blatant explicit discrimination
: using membership in a protected class as an explicit input variable in the ML model.Discrimination based on redundant encoding
: instead of using membership in a protected class explicitly, include it implicitly by using a (potentially class of) input variable(s) that predict membership in the protected group pretty well (e.g., as we have seen before, zipcode is pretty good predictor of race in the US. As another example, here is a creepy story about how Target figured out that a teenager was pregnant even before her father knew about it :Redlining
: this is a specific implementation of discrimination based on redundant encoding that we have seen before.
Cutting off business with a segment of population with disproportionately high membership in a protected class
: this is a generalization ofredlining
, where instead of the majority of the redlined population belonging to the protected class, the redlined population's fraction of members in the protected class is larger than the fraction in general public.Self-fulfilling policy
: purposefully interviewing an unqualified member (or many unqualified members) of the protected class to reject them and to make a "case against" the whole protected group.Reverse-tokenism
: "complement" ofself-fulfilling policy
, where the employer purposefully rejects a qualified member not in the protected class as a justification to reject all members of the protected class in employment decisions.
So far we have seen how using an ML pipeline might make it easier to do masking. More generally, if one is so inclined, an ML pipeline can be built with biases in-built leading to potential masking effects. Given that Title VII is used to guard against human discrimination in employment decisions, it is natural to ask:
Can Title VII be used to guard against discrimination in employment decisions made (in part) by an ML pipeline?
While masking is certainly a potential issue, we will also consider the situation where biases creep into the ML pipeline perhaps not intentionally. Finally, we will consider these question under the umbrella of disparate treatment and disparate impact.
Disparate treatment and the ML pipeline
Recall that there are two kinds of disparate treatment. The first is formal discrimination, where the ML pipeline uses membership in a protected class as an input variable. As we have seen before, even if that input variable is not predictive, since membership in a protected class is used explicitly, this constitutes disparate treatment (though remember there can be caveats). Let us consider a case we have seen above:
Exercise
Consider the case when a set of input variables that together form a good predictor for membership in a protected class (e.g. race of pregnancy status) but membership in the class is not used as an input variable. Can the user of the model be sued under the formal discrimination part of disparate treatment?
No, since the membership in the protected class is not used as an explicit input variable.
For the rest of our discussion on disparate treatment and decisions made by ML pipelines we will focus on the intentional discrimination part of disparate treatment. In particular, we will consider the same situation as in the exercise above but we will consider whether it can be covered under the intentional discrimination aspect of disparate treatment.
Exercise
Let is recall the Street Bump :
Note that because of the way the data was collected, the app discriminates against older folks2. In this exercise, you will figure out if this can be tried under the intentional discrimination part of disparate treatment. (We know that this is not an employment decision but humor us as we go through this exercise.)
Click here for suggestions for Instructors
If running this as an in-class exercise, breakup the class into two groups and have them play the role of plaintiff and defendant against each other (i.e. they are both defendants in the first round. Then they play the role of plaintiff for the other group in the second round. Finally a group will respond to the claim of the plaintiff from the previous round.
This is a madeup legal case
Just so that there is no confusion-- this is not a real legal case nor do we guarantee that the procedure outlined below is exactly how things would proceed even if the case were a real-life legal case.
We will use the the McDonnell Douglas Corp. v. Green case framework to decide on whether there is a case for intentional discrimination (recall the steps for this framework that we have seen before):
Round 1
: Make the case as a plaintiff who is an old person without access to smartphone: i.e. argue why a similarly situated3 young person who has a smartphone would have a non-discriminatory outcome.Round 2
: Make the case as the defendant (in this case Street Bump): i.e. present a legitimate non-discriminatory reason for their decision to use data collected from smartphones.Round 3
: Try to win the case as the plaintiff: i.e. argue that the reason offered by the defendant is pretextual (or more colloquially "bogus").
It is extremely unlikely that the plaintiff is going to win this case based on the intentional discrimination part of disparate impact since the plaintiff could make the case that they were inattentive to representation bias when building the ML pipeline. While it is possible that there might have been an implicit bias e.g. the ML developer fell into the "smartphone trap", it will be near impossible to argue that this was even an implicit (let alone intentional) bias against a protected class. (Note that the class of people without smartphones is not a protected class.)
We have seen so far that it seems unlikely that one can prove intentional/formal discrimination under disparate treatment, since the ML pipeline developer can claim to not have intentionally discriminated. One potential way around this would be to try and apply disparate treatment directly on the model. We go through this exercise next:
Exercise
Can we apply disparate treatment directly to the model in case it is discriminating? If so, what are the potential issues? If not, why not?
No, since disparate treatment as is currently in place is only applicable to human decision makers.
In summary, disparate treatment does not seem equipped to handle discrimination due to an ML pipeline. Quoting Barocas and Selbst (where they used data mining
as a representative of an ML pipeline):
In sum, aside from rational racism and masking (with some difficulties), disparate treatment doctrine does not appear to do much to regulate discriminatory data mining.
Disparate impact and the ML pipeline
We now consider how the ML pipeline interacts with disparate impact. Recall that there are three steps to a disparate impact case. Before we talk about these steps and how they interact with an ML pipeline in general, we continue with the Street Bump example:
Exercise
Recall that because of the way the data was collected by Street Bump, the app (potentially) discriminates against older folks. In this exercise, you will figure out if this can be tried under disparate impact. (Again, we know that this is not an employment decision but humor us as we go through this exercise.)
Click here for suggestions for Instructors
If running this as an in-class exercise, breakup the class into two groups and have them play the role of plaintiff and defendant against each other (i.e. they are both defendants in the first round. Then they play the role of plaintiff for the other group in the second round. Finally a group will respond to the claim of the plaintiff from the previous round.
This is a madeup legal case
Just so that there is no confusion-- this is not a real legal case nor do we guarantee that the procedure outlined below is exactly how things would proceed even if the case were a real-life legal case.
We will follow the three steps to a disparate impact case:
Round 1
: Make the case as a plaintiff: i.e. argue why the use of smartphone data collection has an adverse impact on the older population (when compared to say the younger population).Round 2
: Make the case as the defendant (in this case Street Bump): i.e. present a business reason for their decision to use data collected from smartphones.Round 3
: Try to win the case as the plaintiff: i.e. show that there exists an alternate data collection mechanism that would have worked just as well for Street Bump with less discriminatory results.
Next, we walk through the three steps of a disparate impact case and re-consider them specifically for the case when the (employment) decision is being made by an ML pipeline.
Adverse impact on certain groups in a protected class
Recall that the first step in a disparate impact case is to establish that a certain group in a protected class is adversely affected compared to other groups. A common metric used here is the so called four-fifths rule , where there is an adverse impact on a group if the fraction of people in that group accepted by the employment procedure is less than $\frac 45$ths of the fraction of people in the "best performing" group in the protected class.
This part, at least in principle, is not a hard task to do for an ML pipeline since we can run the pipeline on a (suitably sized) random sample and observe what fraction of people from different groups are accepted by the ML pipeline. Indeed, there have been notion of fairness for ML models proposed that are based on the four-fifths rule. At a high level an outcome is not fair if some measurement of accuracy in certain group is less than $80\%$ of the same measure of accuracy in some other group in the protected class.
The two notions are not the same!
While the notions of fairness were inspired by the four-fifths rule, there is a very important (but perhaps a bit subtle) difference. The notion of fairness mentioned above is purely outcome based while the notion of disparate impact (as well as disparate treatment) as we have already seen are inherently procedural. This point was explicitly made in the FAT* 2019 paper by Selbst, boyd, Friedler, Venkatasubramanian and Vertesi. For bit more detailed discussion on this mis-match between definitions of fairness and the law principles that inspired them, see the 2019 paper by Xiang and Raji.
Showing job relatedness
Assuming adverse impact of the ML model has been shown the next step in the disparate impact case would be for the defendant (i.e. the employer) to give a job relatedness reason for using the existing policy. However, what constitutes "job relatedness" might not always be so cut and dry. In this section, we walk through various steps in determining job relatedness when using an ML pipeline to make the employment decision.
The first one would ask is
Whether the target variable is job related?
Since every ML model will have a target variable, the first step in determining job relatedness would be to determine if the target variable is indeed job related? In particular, an employer would be liable for disparate impact for improper selection of the target variable.
For example, if the ML model uses membership in a race as a target variable, then the employer would be hard-pressed to argue that this target variable is job related.
The next question to ask is
Whether the model itself is predictive of the target variable?
Once it is established that the target variable is indeed job related then one must determine whether the ML model is indeed predictive of the target variable (as otherwise the outcome of the ML model is not doing what it is supposed to do and hence might not be job related).
However, it is perhaps safe to assume that the ML model being used is indeed predictive of the stated target variable (otherwise why would a company use it?). For the rest of the section, we will assume that the ML model being used is (strongly) predictive of the target variable.
However, we have already seem that target variables are typically proxies for some underlying trait that one is trying to measure via the ML model. So the next question to ask is
Does the ML model predict what it is supposed to?
As mentioned above, what this step is trying too determine is whether the ML model is really predicting the trait that is "really" job related. In other words, the question here is whether the ML model has measurement bias (when applied to the target variable).
In the traditional disparate impact cases, this manifest e.g. in cases where the question is whether a general aptitude test used by an employer is really predictive of some inherent trait that is job related. Recall this was exactly the issue in the Griggs v. Duke Power Co. case:
However, the above step still does not "define" what "job relatedness" of a procedure/test is. There are some established guidelines for this:
Guidelines for Determining validity of job relatedness claims
There are three main forms of validity:
Criterion related validity
: Here the idea is to show empirically that the test/procedure is indeed predictive of the traits that are important requirements of the job under consideration.Content validity
: Here the idea is to justify that the test/procedure is designed to measure skills/abilities that can "only" be learned on the job (and not via say some training sessions).Construct validity
: Here the idea is to justify that test/procedure is specifically designed to measure some inherent trait (e.g. "grit" in student admissions) that is related to job performance.
Exercise
Which of the forms of validity above are suited when the test/procedure is an ML model and which ones are not?
So what we have seen far (or at least what we have tried to show so far) is that assuming that ML pipeline are good predictors of something (i.e. we take predictiveness as a given), then the employer is OK as long as the target variable is justifiable as measuring a trait that is required for job performance.
But, wait...
What if the target variable has indeed been chosen properly but the model itself is biased, perhaps on purpose say via masking?
This part is addressed by the last step of a disparate impact case, which we discuss next.
Alternative employment practice
In the third and final step of a disparate impact case (where it is assumed that it has been established that the employer had a valid job related reason to use its current ML pipeline decision), it is now the plaintiff's responsibility to show that there is an alternative employment practice. More precisely,
Alternative Employment practice
To win the case the plaintiff now has to show two things:
- The plaintiff has to produce an alternative ML model (or some other decision procedure) that is just as predictive of the target variable (which has been established as job related in the second step) but does not have the discriminatory impact against the protected class; and
- (This is the part we did not explicitly mention above) the employer
refuses
to use the alternative.
The first step above should be reasonably clear but the second step sounds more murky. And indeed, there is no clear-cut answer for what constitutes "refusal". Hence, in lieu of answering this question, we will consider couple of scenarios of what would be considered as "refusal" by the employer:
Exercise
Consider the following two situations. In each case, do you this the employer's actions should be considered to be a "refusal"?
- Employer had used third-party data to develop its ML model. Since it does not have access to the more fine-grained version of the original data, it cannot train a model proposed by the plaintiff that uses more input variables that lead to a less discriminatory ML model and does not implement the more sophisticated model proposed by the plaintiff.
- The employer can access the finer grained data and in theory learn the more sophisticated model proposed by the plaintiff. However, the cost of gathering the more fine grained data is so high that doing so will bankrupt the company. Hence, it decides not to implement the more sophisticated model proposed by the plaintiff.
Final thoughts on using disparate impact for ML pipeline outcomes
In summary, we conclude with two quotes from Barocas and Selbst. The first is a distillation of disparate impact that is important to understand (its limitations/reach):
Disparate impact doctrine was created to address unintentional discrimination. But it strikes a delicate balance between allowing businesses the leeway to make legitimate business judgments and preventing “artificial, arbitrary, and unnecessary” discrimination.
The second quote is that conclusion that disparate impact is probably also not sufficient to stop discrimination via ML pipelines
(where they used data mining
as a representative of an ML pipeline):
Successful data mining operations will often both predict future job performance and have some disparate impact. Unless the plaintiff can find an alternative employment practice to realistically point to, a tie goes to the employer.
Concluding thoughts
Barocas and Selbst conclude that the disparate treatment and impact paradigms that are not well equipped to handle the new challenges brought in with the use of ML pipeline in decision making. They also go over challenges in doing so but we will not cover them in these notes and instead refer them to the paper by Barocas and Selbst.
They also summarize the following overview of Title VII, which we quote below:
Title VII does not require an employer to use the least discriminatory means of running a business. Likewise, Title VII does not aim to remedy historical discrimination and current inequality by imposing all the costs of restitution and redistribution on individual employers. It is more appropriately understood as a standard of defensible disparate impact.
They lay down some suggestions on how to make Title VII more effective in dealing with disparate treatment/impact cases when the ML pipeline is involved in the decision making, pretty much all of which are suggestions to make structural changes beyond Title VII (see the paper by Barocas and Selbst for details).
Next Up
Next, we will consider how when an ML pipeline is deployed in society, it can create a (many times not so nice) feedback loop.