LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Start free trial Sign in

From the course: Ethics and Law in Data Analytics

Bias in data processing: Part 2

From the course: Ethics and Law in Data Analytics

Start my 1-month free trial

Bias in data processing: Part 2

“

- In the previous video, we looked at one way that bias can enter the subconscious of machines if you will, and that was through historically biased data. Now this is true on a second level, not only are machines bound to learn bias by biased historical data, but they may also develop bias by paying attention to the irrelevant data. In Module Three, we will name this as the problem of irrelevant proxies, and examine the issue in more detail. For now, let's think about how an algorithm is created. Well, it necessarily needs inputted data, and as we saw in the Module One lab, you can call these variables, but of course, the machine algorithm does not use all the variables that ever were or could be collected, it only uses the variables that we decide to feed it, which is a step of the process sometimes called feature selection. And remember, the ones who decide which features to select are human beings, the same as you and me, with all of our human temptations and biases. So if you tell a machine to incorporate the data relating to ZIP Code, it will. But if ZIP Code turns out to be a proxy for race, as it often is, you have just selected a feature that is going to make the machine's decision making process more biased, and perhaps even just racist. It makes no difference that you intentionally introduced race or not, the disparate impact is the same. And it is important to note that the problem of feature selection and the problem of biased data, introduces bias to machines in ways that are often invisible. And when the biases are invisible, they have the potential to be much more harmful, in exactly the same way that unconscious bias of humans can be. We all agree that conscious bias is bad, but at least it's more likely to be easy to spot, and it's much easier to fight an enemy that can be seen. If this makes you disappointed, that's an understandable response, because sometimes AI systems such as predictive algorithms are sold specifically on the promise that they will remove human bias from the decision making process. A perfect example is the recidivism algorithm that we were working on in the labs at the end of each module. Inevitably, you will hear someone say something like, see how great this predictive algorithm is, now we don't have to worry about a jury or lawyers or a judge with bias in their hearts, an objective machine will do the job for us. And while it's certainly true that algorithms don't have the same temptations as humans, it is very important to realize that we have not solved the problem of bias. A much better way to think about this, is that we have substituted one kind of bias problem for another. And even when, or if, it turns out that machines are less biased, or even when and if it turns out to be true that it is easier to fix machine bias than human bias, the fight against bias must continue. It just has a new level of complexity.

Contents