Originally posted on The Horizons Tracker.
As Daniel Kahneman illustrates in Noise, AI-based systems can be effective in reducing the variability in decision making, but they nonetheless suffer from the biases introduced by the quality of the data used to train the algorithms. For instance, one recent investigation showed that lenders were 80% more likely to reject Black applicants than similar white applicants.
While the intuitive approach may be to simply remove race from the dataset, but this can still suffer from “latent discrimination”, which in this instance would emerge due to the likelihood of Black people living in certain areas, so loan biases would instead apply to certain locations as a proxy for race.
Research1 from Yale proposes a better solution that the researchers believe is sensitive to gender, race, and so on. The approach ensures that sensitive data is included when training algorithms, but then masked when actually being used. They believe the approach maintains the accuracy of the system while reducing the discrimination in it.
Ensuring systems are fair is increasingly important as they often help to distribute resources, which if people are denied what they are entitled to, it can exacerbate disadvantages they already face.
The approach works in two phases. The first of these uses training data to help the algorithm learn how particular attributes are linked to each outcome. The algorithm is then given information about any new cases and attempts to predict what will happen based on similarities with previous cases.
The researchers explain that removing sensitive information from the training data can result in latent discrimination, they had to think of a different approach to reduce bias in the system. One approach they considered was to boost the scores of people from disadvantaged groups, but this resulted in two people who are identical other than their race or gender receiving different scores, which typically produced a backlash.
Train then mask
The eventual approach decided upon was referred to as “train then mask”. It involved the system being given all of the information about past cases during the training phase, including any sensitive information. This approach meant that the algorithm wasn’t incorrectly giving undue importance to factors that were unrelated and could also be used as a proxy for more sensitive features.
They then hid the sensitive features in the second stage, so that all new cases would be given the same value for these features. This would force the system to look beyond both race itself and any proxies for race when it compared individuals.
“To be clear, train then mask is by no means the only method out there that proposes to deal with the issue of algorithmic bias using ‘awareness’ rather than ‘unawareness’ of sensitive features such as gender or race,” the researchers say. “But unlike most proposed methods, train then mask emphasizes helping disadvantaged groups while enforcing that those who are identical with respect to all other—non-sensitive—features be treated the same.”
The system was tested by performing three tasks, the first of which was to predict an individual’s income status, the second whether a credit applicant would pay their bills on time, and the third whether a criminal would re-offend. The system was trained on real data and its results were compared with that from other algorithms.
The system was able to produce results that were as accurate as an unconstrained algorithm, or one that had not been adjusted to try and reduce unfairness. The researchers also believe that the approach helps to reduce what they refer to as “double unfairness” where someone from a minority group performs better than those from the majority group on certain metrics but the discrimination they face lumps them in with the majority. The “train then mask” approach overcomes this because it doesn’t try to minimize the difference in output between two groups so the double unfairness problem wouldn’t emerge.
While the team accepts that their approach won’t be right for every task, they do believe it nonetheless avoids latent discrimination while also ensuring that two applicants who differ in terms of race or gender are treated the same if they’re otherwise identical.
“If you want to have these two things at the same time, then I think this is for you,” they conclude.
Article source: How To Remove Biases From Algorithms.
- Ghili, S., Kazemi, E., & Karbasi, A. (2019, July). Eliminating latent discrimination: Train then mask. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 3672-3680). ↩