How to remove biases from algorithms

Adi Gaskell3 Mar 2022

1,374 3 minutes read

Originally posted on The Horizons Tracker.

As Daniel Kahneman illustrates in Noise, AI-based systems can be effective in reducing the variability in decision making, but they nonetheless suffer from the biases introduced by the quality of the data used to train the algorithms. For instance, one recent investigation showed that lenders were 80% more likely to reject Black applicants than similar white applicants.

While the intuitive approach may be to simply remove race from the dataset, but this can still suffer from “latent discrimination”, which in this instance would emerge due to the likelihood of Black people living in certain areas, so loan biases would instead apply to certain locations as a proxy for race.

Research¹ from Yale proposes a better solution that the researchers believe is sensitive to gender, race, and so on. The approach ensures that sensitive data is included when training algorithms, but then masked when actually being used. They believe the approach maintains the accuracy of the system while reducing the discrimination in it.

Fair systems

Ensuring systems are fair is increasingly important as they often help to distribute resources, which if people are denied what they are entitled to, it can exacerbate disadvantages they already face.

The approach works in two phases. The first of these uses training data to help the algorithm learn how particular attributes are linked to each outcome. The algorithm is then given information about any new cases and attempts to predict what will happen based on similarities with previous cases.

The researchers explain that removing sensitive information from the training data can result in latent discrimination, they had to think of a different approach to reduce bias in the system. One approach they considered was to boost the scores of people from disadvantaged groups, but this resulted in two people who are identical other than their race or gender receiving different scores, which typically produced a backlash.

Train then mask

The eventual approach decided upon was referred to as “train then mask”. It involved the system being given all of the information about past cases during the training phase, including any sensitive information. This approach meant that the algorithm wasn’t incorrectly giving undue importance to factors that were unrelated and could also be used as a proxy for more sensitive features.

They then hid the sensitive features in the second stage, so that all new cases would be given the same value for these features. This would force the system to look beyond both race itself and any proxies for race when it compared individuals.

“To be clear, train then mask is by no means the only method out there that proposes to deal with the issue of algorithmic bias using ‘awareness’ rather than ‘unawareness’ of sensitive features such as gender or race,” the researchers say. “But unlike most proposed methods, train then mask emphasizes helping disadvantaged groups while enforcing that those who are identical with respect to all other—non-sensitive—features be treated the same.”

The system was tested by performing three tasks, the first of which was to predict an individual’s income status, the second whether a credit applicant would pay their bills on time, and the third whether a criminal would re-offend. The system was trained on real data and its results were compared with that from other algorithms.

Strong results

The system was able to produce results that were as accurate as an unconstrained algorithm, or one that had not been adjusted to try and reduce unfairness. The researchers also believe that the approach helps to reduce what they refer to as “double unfairness” where someone from a minority group performs better than those from the majority group on certain metrics but the discrimination they face lumps them in with the majority. The “train then mask” approach overcomes this because it doesn’t try to minimize the difference in output between two groups so the double unfairness problem wouldn’t emerge.

While the team accepts that their approach won’t be right for every task, they do believe it nonetheless avoids latent discrimination while also ensuring that two applicants who differ in terms of race or gender are treated the same if they’re otherwise identical.

“If you want to have these two things at the same time, then I think this is for you,” they conclude.

Article source: How To Remove Biases From Algorithms.

Header image source: Gerd Altmann on Pixabay, Public Domain.

Reference:

Ghili, S., Kazemi, E., & Karbasi, A. (2019, July). Eliminating latent discrimination: Train then mask. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 3672-3680). ↩

Rate this post

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Fair systems

Train then mask

Strong results

Adi Gaskell

Related Articles

Wearing an adversarial patch can fool automated security cameras [Top 100 journal articles of 2019]

How firms can get value from AI

In the know: AI and RealKM | ChatGPT as malevolent AI | Rigour in complexity

Do recommender systems help us make better decisions?