Artificial intelligenceBrain powerFeatured Stories

Using neuroscience to help human knowledge contribute to AI safety

This article is part of an ongoing series looking at AI in KM, and KM in AI.

The recently published second edition of the International AI Safety Report1 finds that while there have been positive advances in AI safety, serious challenges remain, for example, the reliable pre-deployment safety testing of AI models has become harder to conduct.

The new arXiv pre-print paper2 “NeuroAI for AI Safety” proposes that human knowledge is an attractive model for AI safety. As the only known agents capable of general intelligence, humans perform robustly even under conditions that deviate significantly from prior experiences, explore the world safely, understand pragmatics, and can cooperate to meet their intrinsic goals. Intelligence, when coupled with cooperation and safety mechanisms, can drive sustained progress and well-being.

These properties are a function of the architecture of the brain and the learning algorithms it implements. Paper authors Mineault and colleagues therefore contend that neuroscience may hold important keys to technical AI safety that are currently underexplored and underutilized. In response, they highlight and critically evaluate several paths toward AI safety inspired by neuroscience.

Mineault and colleagues use the technical framework introduced by Deepmind in 2018 to identify three aspects of how studying the brain could positively impact AI safety:

  1. Robustness – specifying how an agent can safely respond to unexpected inputs. This includes perform-
    ing well or failing gracefully when faced with adversarial and out-of-distribution inputs, and safely
    exploring in unknown environments. This can also mean learning compositional representations that
    generalize well out-of-distribution. Robustness further implies knowing what you do not know, by
    maintaining a representation of uncertainty, to ensure safe and informed decision-making in novel or
    uncertain scenarios.
  2. Specification – specifying the expected behavior of an AI agent. A pithy way of expressing this is that
    we want AI systems to “do what we mean, not what we say”. This includes correctly interpreting instructions specified in natural language despite ambiguity; preventing learning shortcuts that generalize poorly; ensuring that agents solve the real task at hand rather than engaging in reward
    hacking (i.e. Goodhart’s law); and so on.
  3. Assurance (or oversight): being able to verify that AI systems are working as intended. This includes
    opening the black box of AI systems using interpretability methods; scalably overseeing the deployment
    of AI systems and detecting unusual or unsafe behavior; or detecting and correcting for bias.

Each of Mineault and colleagues’ eight proposals for neuroscience for AI safety are listed in Table 1, along with which of the above aspects of AI safety they propose to affect.

Table 1: Proposals for how neuroscience can impact AI safety (source: Mineault et al., 2025).

Proposed method Summary of proposition Safety aspect
Reverse-engineer sensory systems Build models of sensory systems (“sensory digital twins”) which display robustness, reverse engineer them through mechanistic interpretability, and implement these systems in AI Robustness
Build embodied digital twins Build simulations of brains and bodies by training auto-regressive models on brain activity measurements and behavior, and embody them in virtual environments Simulation
Build biophysically detailed models Build detailed simulations of brains via measurements of connectomes (structure) and neural activity (function) Simulation
Develop better cognitive architectures Build better cognitive architectures by scaling up existing Bayesian models of cognition through advances in probabilistic programming and foundation models Simulation,  Assurance
Use brain data to finetune AI Finetune AI systems through brain data; align the representational spaces of humans and machines to enable few-shot learning and better out-of-distribution generalization Specification, Robustness
Build an evolutionary curriculum Build safety guardrails in AI by recapitulating the natural evolutionary curriculum Specification
Infer the brain’s loss functions Learn the brain’s loss and reward functions through a combination of techniques including task-driven neural networks, inverse reinforcement learning, and phylogenetic approaches Specification
Use neuroscience methods for interpretability Leverage methods from neuroscience to open black-box AI sys-tems; bring methods from mechanistic interpretability back to neuroscience to enable a virtuous cycle Assurance

Article source: Mineault et al., 2025; CC BY 4.0.

Header image source: Gerd Altmann on Pixabay.

References:

  1. Bengio, Y., Clare, S., Prunkl, C., Andriushchenko, M., Bucknall, B., Murray, M., … & Mindermann, S. (2026). International AI Safety Report 2026. UK Government.
  2. Mineault, P., Zanichelli, N., Peng, J. Z., Arkhipov, A., Bingham, E., Jara-Ettinger, J., … & Zador, A. (2024). NeuroAI for AI safety. arXiv preprint arXiv:2411.18526.

Bruce Boyes

Bruce Boyes is editor, lead writer, and a director of RealKM Magazine and winner of the International Knowledge Management Award 2025 (Individual Category). He is an experienced knowledge manager, environmental manager, project manager, communicator, and educator, and holds a Master of Environmental Management with Distinction and a Certificate of Technology (Electronics). His many career highlights include: establishing RealKM Magazine as an award-winning resource with more than 2,500 articles and 5 million reader views, leading the knowledge management (KM) community KM and Sustainable Development Goals (SDGs) initiative, using agile approaches to oversee the on time and under budget implementation of an award-winning $77.4 million recovery program for one of Australia's iconic river systems, leading a knowledge strategy process for Australia’s 56 natural resource management (NRM) regional organisations, pioneering collaborative learning and governance approaches to empower communities to sustainably manage landscapes and catchments in the face of complexity, being one of the first to join a new landmark aviation complexity initiative, initiating and teaching two new knowledge management subjects at Shanxi University in China, and writing numerous notable environmental strategies, reports, and other works.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button