
Responsible AI case study: integrating advanced language models with validation by domain experts
This article is part of an ongoing series looking at AI in KM, and KM in AI.
Articles in RealKM Magazine‘s long-running series on artificial intelligence (AI) alert to the critical need for checking the accuracy of generative AI outputs before using them, as part of frameworks for ethical and responsible AI in knowledge management (KM). These articles include my presentation to the KMGN KM Trends 2026 event1 where I put forward two very different scenarios for the future. In one, AI is used ethically and responsibly, leading to a very positive outcome. In the other, it is not, leading to disaster.
A new case study2 published in the journal Artificial Intelligence and Law provides further evidence in support of the need for ethical and responsible AI. As the case study is law-related, it can hopefully also help in making law students more aware of the dangerous folly of trusting unverified AI content3.
Study authors Lingyi Meng, Maolin Liu, Hao Wang, Yilan Cheng, Qi Yang, and Idlkaid Mohanmmed propose a human-AI collaborative approach for building a multilingual legal terminology database, based on a multi-agent framework. Accurately mapping legal terminology across languages remains a significant challenge, especially for language pairs like Chinese and Japanese. So Meng and colleagues’ approach integrates advanced large language models (LLMs) and legal domain experts throughout the entire process—from raw document preprocessing and article-level alignment, to terminology extraction, mapping, and quality assurance.
Unlike a single automated process, Meng and colleagues’ approach places greater emphasis on how human experts participate in this multi-agent system. As shown in Figure 1, humans and AI agents take on different roles: AI agents handle specific, repetitive tasks, such as OCR, text segmentation, semantic alignment, and initial terminology extraction, while human experts provide crucial oversight, review, and supervise the outputs with contextual knowledge and legal judgment.

human-AI collaboration (source: Meng et al., 2025).
Meng and colleagues’ evaluation of their human-AI collaborative approach using a substantial collection of trilingual legal texts found that it led to marked improvements in term coverage, semantic coherence, and contextual accuracy,
In discussing the results of their study, Meng and colleagues provide a very clear answer to the following question (‘NLP’ means ‘natural language processing’):
Why is human-AI collaboration necessary—can’t large language models do everything automatically? Despite the remarkable progress of large language models (LLMs) in legal NLP, our work and direct testing reveal the limits of purely automated pipelines. While LLMs can process large volumes of text and generate plausible legal terminology suggestions, they are not immune to common pitfalls: errors in detecting term boundaries, contextual mismatches, and at times, outright hallucinations. These issues become especially pronounced in complex legal passages or in under-resourced language pairs, where training data is sparse and ambiguity is high. Our case studies repeatedly showed that—even with state-of-the-art models redundancy and inconsistency can propagate through the extraction pipeline if left unchecked. This is why expert human review remains essential: not only to correct and clarify terminology, but to ensure that the results actually comply with legal and professional standards. In fields as sensitive as law, human-AI partnership is less a luxury than a necessity—crucial both for quality assurance and for meeting regulatory expectations.
As an aside, Meng and colleagues also found that recent open-source large language models can perform at a level comparable to commercial products, suggesting that robust and cost-effective AI solutions are increasingly within reach.
Header image source: Created by Bruce Boyes with Microsoft Designer Image Creator.
References:
- Boyes, B. (2025, December 11). KM Trends 2026: Two very different scenarios for AI in KM. RealKM Magazine. ↩
- Meng, L., Liu, M., Wang, H., Cheng, Y., Yang, Q., & Mohanmmed, I. (2025). Building from scratch: a multi-agent framework with human-in-the-loop for multilingual legal terminology mapping. Artificial Intelligence and Law, 1-40. ↩
- Alimardani, A. (2025). Borderline Disaster: An Empirical Study on Student Usage of GenAI in a Law Assignment. IEEE Transactions on Technology and Society. ↩

![Crowd Human Silhouettes Personal Group of People [Pixabay image 2045499]](https://realkm.com/wp-content/uploads/2018/05/crowd-2045499_640.jpg)


