AI-based credit risk tools can be ruined by noisy data

Adi Gaskell1 Dec 2021

840 3 minutes read

Originally posted on The Horizons Tracker.

One’s credit score is often hugely important, with it very difficult to secure substantial loans, such as mortgages, without a healthy credit rating. It’s increasingly common for financial providers to use AI to produce credit risk scores, but research¹ from Stanford University highlights how bad data can cause such systems to go astray.

The study finds that predictive tools are often up to 10% less accurate for minority groups and lower-income families. This isn’t due to any inherent bias in the systems but rather the relative paucity of data which means that they’re less accurate in predicting the creditworthiness of these groups.

Thin credit history

It’s well known that a thin credit history will often result in higher borrowing costs, simply because lenders don’t have as much data to go on as to your trustworthiness. This can also mean, however, that it doesn’t take much to send your credit rating spiralling in the wrong direction.

“We’re working with data that’s flawed for all sorts of historical reasons,” the researchers say. “If you have only one credit card and never had a mortgage, there’s much less information to predict whether you’re going to default. If you defaulted one time several years ago, that may not tell much about the future.”

The researchers themselves used AI to analyze vast quantities of consumer data, which allowed them to test various credit-scoring models. They began by analyzing credit data from 50 million people to see if existing methods were equally accurate for all demographic groups.

Risk assessment

The key to the challenge with risk assessments is understanding if people who were rejected for loans would have gone on to default or not. The researchers were able to examine whether people who had been rejected for one loan were able to keep up with payments on other loans.

The results suggest that credit ratings tended to be much less accurate for minority borrowers or those with low income than they were for other borrowers. The researchers hypothesize that this is because these groups have much more misleading data in their credit scores.

To test this, they tried out a number of alternative scoring models that had been built to better respond to minority and low-income borrowers. These didn’t seem to help, and indeed the scores were even less accurate. This highlighted that the problem was not the models themselves, but the data they rely on.

Limited information

The real problem is that people with poor credit scores often have a very limited financial history, so it’s harder to assess them for creditworthiness. This was particularly so for people who had a couple of blemishes on their record.

When the researchers were able to recruit more data with which to feed the models, they were able to eliminate around half of the disparity in accuracy that existed.

The results clearly illustrate how people from poorer backgrounds may be being rejected for credit unfairly, which results in a misallocation of credit and even perpetuation of inequality as poorer people miss out on the ability to build a credit score and thus increase their wealth.

The researchers accept that there is no straightforward solution to this problem, and it may even require financial firms to experiment with giving credit to people even if they have relatively poor credit scores.

“If you’re a bank, you could give loans to people and see who pays,” the authors conclude. “That’s exactly what some fin-tech companies are doing: giving loans and then learning.”

Article source: AI-Based Credit Risk Tools Can Be Ruined By Noisy Data.

Header image source: Credit Report text with magnifying glass by Marco Verch, CC BY 2.0.

Reference:

Blattner, L., & Nelson, S. (2021). How Costly is Noise? Data and Disparities in Consumer Credit. arXiv preprint arXiv:2105.07554. ↩

Rate this post

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Thin credit history

Risk assessment

Limited information

Adi Gaskell

Related Articles

The role of AI in Knowledge Management [EKM series]

The Business of Data report from the Economist Intelligence Unit (EIU)

Lack of workplace training puts AI revolution at risk

Why ChatGPT isn’t conscious – but future AI systems might be