
Are chatbots widening the digital language gap?
Originally posted on The Horizons Tracker. This article is part of an ongoing series looking at AI in KM, and KM in AI.
Researchers at Johns Hopkins warn1 that AI chatbots are widening a digital language gap. Tools such as ChatGPT handle English and other dominant tongues well, but often sideline minority languages. Instead of breaking barriers, they risk creating “information cocoons.”
The team asked a simple question: are large language models truly multilingual? To test this, they wrote sample news stories with both accurate and conflicting details, covering wars and local events. These were written in high-resource languages such as English, German, and Chinese, as well as lower-resource ones like Hindi and Arabic.
They then posed queries in different languages to models from OpenAI, Anthropic and others. The results were clear. Models tended to echo the information available in the same language as the question. If an English article said an Indian politician was corrupt, while a Hindi article praised them, the answer would flip depending on the query’s language.
Defaulting to English
When no source existed in the language of the query, the models defaulted to higher-resource languages. That usually meant English. In practice, a Hindi user asking about the India-China border might see an Indian perspective, a Chinese user would see China’s, and an Arabic user would receive the American one. Three languages, three versions of “truth.”
The researchers call this linguistic imperialism: the dominance of English crowds out smaller languages, skewing the information people receive. They warn that this can distort debate on sensitive issues such as wars, trade disputes and elections. The information you see shapes the choices you make.
The team’s label for today’s systems is blunt: “faux polyglots.” They mimic fluency across languages but fail to integrate perspectives. The danger is filter bubbles split not only by ideology but also by language.
To fix this, the Hopkins group is building benchmarks to test multilingual fairness. They suggest models should draw on diverse sources, warn users when results are skewed, and promote literacy about how conversational AI works. They also urge more varied training data and tools that explain where information comes from.
The risk is not only technical but political. A handful of firms control the flow of AI-generated information. If their systems amplify some languages and suppress others, they hold disproportionate sway over global debate. “As a society, we need users to get the same information regardless of their language and background,” the researchers argue.
Article source: Are Chatbots Widening The Digital Language Gap?
Header image source: Created by Bruce Boyes with Microsoft Designer Image Creator.
Reference:
- Sharma, N., Murray, K., & Xiao, Z. (2025, April). Faux polyglot: A study on information disparity in multilingual large language models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (pp. 8090-8107). ↩




