From the course: Ethics and Law in Data Analytics
Bias in data processing: Part 2
From the course: Ethics and Law in Data Analytics
Bias in data processing: Part 2
- In the previous video, we looked at one way that bias can enter the subconscious of machines if you will, and that was through historically biased data. Now this is true on a second level, not only are machines bound to learn bias by biased historical data, but they may also develop bias by paying attention to the irrelevant data. In Module Three, we will name this as the problem of irrelevant proxies, and examine the issue in more detail. For now, let's think about how an algorithm is created. Well, it necessarily needs inputted data, and as we saw in the Module One lab, you can call these variables, but of course, the machine algorithm does not use all the variables that ever were or could be collected, it only uses the variables that we decide to feed it, which is a step of the process sometimes called feature selection. And remember, the ones who decide which features to select are human beings, the same as you and me, with all of our human temptations and biases. So if you tell a machine to incorporate the data relating to ZIP Code, it will. But if ZIP Code turns out to be a proxy for race, as it often is, you have just selected a feature that is going to make the machine's decision making process more biased, and perhaps even just racist. It makes no difference that you intentionally introduced race or not, the disparate impact is the same. And it is important to note that the problem of feature selection and the problem of biased data, introduces bias to machines in ways that are often invisible. And when the biases are invisible, they have the potential to be much more harmful, in exactly the same way that unconscious bias of humans can be. We all agree that conscious bias is bad, but at least it's more likely to be easy to spot, and it's much easier to fight an enemy that can be seen. If this makes you disappointed, that's an understandable response, because sometimes AI systems such as predictive algorithms are sold specifically on the promise that they will remove human bias from the decision making process. A perfect example is the recidivism algorithm that we were working on in the labs at the end of each module. Inevitably, you will hear someone say something like, see how great this predictive algorithm is, now we don't have to worry about a jury or lawyers or a judge with bias in their hearts, an objective machine will do the job for us. And while it's certainly true that algorithms don't have the same temptations as humans, it is very important to realize that we have not solved the problem of bias. A much better way to think about this, is that we have substituted one kind of bias problem for another. And even when, or if, it turns out that machines are less biased, or even when and if it turns out to be true that it is easier to fix machine bias than human bias, the fight against bias must continue. It just has a new level of complexity.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
Data, individuals, and society2m 38s
-
Bias in data processing: Part 12m 51s
-
Bias in data processing: Part 23m
-
Legal concerns for equality4m 16s
-
Bias and legal challenges2m 52s
-
Consumers and policy1m 31s
-
Employment and policy1m 24s
-
Education and policy2m 28s
-
Policing and policy1m 52s
-
Best practices to remove bias3m 55s
-
Descriptive analytics and identity4m 25s
-
Privacy, privilege, or right3m 40s
-
Privacy law and analytics6m 29s
-
Negligence law and analytics4m 52s
-
Power imbalances3m 24s
-
IRAC application3m 56s
-
-
-
-