
When AI gets science wrong
Originally posted on The Horizons Tracker. This article is part of an ongoing series looking at AI in KM, and KM in AI.
Large language models (LLMs) like ChatGPT and DeepSeek often get science wrong. A new study1 from Utrecht University found that these AI tools drew inaccurate or overly broad conclusions in up to 73% of cases when summarising scientific research. And when asked to be more accurate, they often did worse.
The researchers tested ten major LLMs—including ChatGPT, Claude, LLaMA, and DeepSeek—over a year. They fed each model thousands of papers from top journals like Nature, Science, and The Lancet, then looked at how the chatbots summarised them. Of the 4,900 summaries collected, most had a common flaw: they overstated the findings.
Subtle exaggeration
The exaggerations were subtle. Chatbots often changed a careful statement like “The treatment was effective in this study” to a sweeping one like “The treatment is effective.” That small shift can give the false impression that the results apply to everyone, not just to the group in the study.
Surprisingly, asking the chatbots to “avoid inaccuracies” made the problem worse. These prompts led to almost twice as many overstatements compared to simple requests like “summarise this.” The researchers warn that users—whether students, scientists, or policymakers—may wrongly trust a summary more when they think they’ve asked for accuracy.
The newer models didn’t help. ChatGPT-4o and DeepSeek were more likely to overstate findings than their older versions. Human writers, by contrast, were five times less likely to make the same mistake.
Some models did better than others. Claude produced the most accurate summaries. The study suggests setting the “temperature” (which controls creativity) to a lower level and using prompts that stick to cautious, past-tense phrasing.
The takeaway is clear: if we want AI to help people understand science, we need to be more careful about how we use it—and how much we trust it.
Article source: When AI Gets Science Wrong.
Header image source: Matheus Bertelli on Pexels.
Reference:
- Peters, U., & Chin-Yee, B. (2025). Generalization bias in large language model summarization of scientific research. Royal Society Open Science, 12(4), 241776. ↩




