Prediction and Fairness

Kaufman, Jeff T.

Prediction and Fairness	August 5th, 2014
	ism

If you're going to lend me money, you want to know how likely I am to pay you back. If you're competing with a whole bunch of other lenders for my business, then you want to charge me as little as possible so I'll go with you, but still enough that you'll make money on average. You make a prediction about how likely I am to pay you back, and the better that is the less you can afford to charge. You can make a much better prediction if you have some information about me, and the more the better: have I paid other people back?, how much do I earn?, how much other debt do I have?, etc.

This information has a downside, however. Say I buy a lot of expensive things on credit, spending more than I could afford to, and then can't pay it back (default). I declare bankruptcy, it's a huge pain for me, and my lenders are out the money. It's pretty uncontroversial that if I then ask you for a loan you can refuse or charge me more, because my past actions suggest there's a larger chance that I'll default. But say 20 years pass and you still want to charge me more, because your evidence is that even after 20 years people who have gone bankrupt once are substantially more likely to do it again: is that ok?

We tend to describe this situation as people being "hurt by their credit history" or "limited by their credit history". The idea is that your reputation is something negative, and companies are hurting you by taking advantage of damaging information about your past. This is a little one-sided: reputation helps some people and hurts others. Say there's some attribute Q, which could be anything, and on average more Q-people default on loans than not-Q-people. If we prohibit taking Q-status into account in making loans this is good for the Q-people because they'll get lower rates, but the lenders have to charge the group as a whole the same rate as before which means raising rates on not-Q-people.

But it's actually worse than this. If you prohibit charging higher rates to Q-people then more of them will take out loans than otherwise would, shifting the pool of people taking out loans, and increasing the rates lenders need to charge to break even. If the increased default rate that correlates with Q-status is large enough, prohibiting taking Q-status into account can in the extreme push the rate up to nearly where it would be if everyone was Q-status.

This pushes me towards the simple view that we should let lenders use any information they can collect, but here's an example that pushes me the other way. Say when you break defaults down by race you find that some races that on average default more often. There are probably lots of reasons for this, but say even after you take into account the rest of the race-neutral data you can collect (income, age, years at job, profession) race still has lots of explanatory power for predicting whether someone will default. It's unlikely that race is actually causing people to default more or less often, but the actual factors might be enough harder to observe that no one has figured out how to do it efficiently or accurately. Which would leave lenders wanting to take race into account in their predictions and the rates they charge. But charging someone more because of their race doesn't seem right at all. [1]

This is similar to the situation with insurance. There companies want to predict how much you're likely to cost them and they also would like to use as much information about you as possible to make that prediction. For flood insurance, they want to know how likely your house is to flood; for car insurance they want to know how likely you are to cause an accident.

Most of the time, however, the better you make your predictions the more effectively you will discriminate against people who are discriminated against in other ways. People with lower incomes are less likely to pay back a large loan. People in poorer neighborhoods are more likely to have their cars vandalized. People who have already had one heart attack are more likely to have another. This is why Holder is worried about risk assessment in sentencing (prediction of likelihood of post-release criminal activity), which could "exacerbate unwarranted and unjust disparities that are already far too common in our criminal justice system and in our society." Bad outcomes are not randomly distributed but tend to cluster.

When likelihood of risk goes the other way, however, we're more ok with people using it in predictions. Men get in more car accidents so insurers charge them more. If rich people were more likely to cost health insurers more, perhaps by requesting specialists and generally working the system to get more thorough care, then we'd probably be ok with insurers using income or wealth number to charge them more. If these went the other way around I expect more people would object.

For an example of this, consider prospective employers requiring drug tests before hiring, as a way of improving their prediction of "will this applicant be a good employee." Is this a good thing? If you think that the effect of this change will be that employers will hire fewer people who really need the work, then you're more likely to be against it. You might even say "employers have no right to this information" or "drug testing goes against basic human dignity." But then let's say you read Wozniak 2014 (pdf, via MR) and you learn that employers are currently making bad predictions about drug use based on apparent race, and once they have drug testing they increase their hiring of black applicants. Does this pull you towards being ok with "improving prediction accuracy" in this situation? Or at least allowing applicants to opt in to drug testing?

It sounds like what we're actually worried about is that more accurate predictions would increase societal inequality by allowing the inequality that already exists to compound. On the other hand, limiting information flow and predictions makes the economy less efficient and in aggregate hurts more than it helps. Instead of limiting prediction, however, if we could manage to allow unrestricted prediction but also implement inequality-reducing wealth transfers (tax the rich, give to the poor) then this should be better overall.

(One problem is that after noting that instead of X it would be better to do not-X plus Y many people will then go ahead and advocate not-X without ensuring Y. So I'm not saying we should just allow all forms of prediction in lending/insurance/sentencing. Without at the same time doing something to reduce inequality this isn't a better outcome.)

[1] If you're not careful in how you restrict taking race into account then you might find lenders finding something that's very highly correlated with race and using that instead. This can even happen automatically with 'big data' algorithms.

←

Contra Dance Mandolin: Chords II Audio from Video

→

Comment via: google plus, facebook, substack

Prediction and Fairness

Recent posts on blogs I like:

Retrospective on life tracking and effectiveness systems

Linkpost for June

Elixir's Last Dance