
Advances & challenges in foundation agents: Section 1.2.1 – Brain functionalities and AI parallels
This article is Chapter 1, Section 1.2.1 of a series of articles featuring the book Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems.
Designing intelligent agents calls for inspiration from the human brain’s functional architecture. A high-level map linking brain regions – frontal, parietal, occipital, temporal lobes, as well as the cerebellum, brainstem, and key subcortical structures – to cognitive functions can reveal gaps between human capabilities and current AI systems, as shown in Figure 1.1.
However, brain functions are not siloed into rigid anatomical zones: most abilities emerge from networks spanning multiple regions. For instance, memory involves the hippocampus (temporal lobe) interacting with frontal cortex and other areas, and “self-awareness” or consciousness cannot be pinpointed to a single spot. Therefore, it is important to keep in mind that cognition is distributed rather than strictly localized. With that in mind, each major brain region’s core functions are reviewed (drawing on Principles of Neural Science by Kandel et al.1, Neuroscience by Purves et al.2, and other sources) and mapped to AI-relevant cognitive capabilities. A set of functions to feature in the figure is then proposed – emphasizing those most relevant to AI agents (e.g. reasoning, memory, planning, perception, decision-making, motivation, emotion, motor skills) – along with an assessment of how developed these functions are in AI. For a big-picture perspective, the state of research in AI is categorized with three distinct levels:
- Level 1 (L1): well-developed.
- Level 2 (L2): partially developed.
- Level 3 (L3): underexplored.
The goal is to come up with a clear, biologically-grounded illustration and discussion that will engage researchers in AI by highlighting which human cognitive functions are replicated in machines and which remain frontier challenges.

Frontal lobes (executive functions and decision-making)
The frontal lobes – especially the prefrontal cortex – are the seat of the brain’s highest-order cognitive functions known collectively as executive functions3. These include abilities such as planning, decision-making, problem-solving, working memory, and inhibitory control (self-control). The frontal lobe is also involved in voluntary motor control (with the rear portion containing the primary motor cortex) and aspects of language (Broca’s area in the left frontal lobe handles speech production). From a neuroscience perspective, damage to the prefrontal cortex famously impairs one’s judgment, planning, and social behavior (as illustrated by the classic Phineas Gage case4). In the context of AI agents, frontal lobe functions correspond to the core “thinking” and control components of an intelligent system:
- Planning and reasoning: AI has made progress here, for example with automated planners and logical reasoners, and large language models (LLMs) that can follow multi-step reasoning to some extent. These are partially developed (L2) in current AI. However, human-level flexible planning remains only partly solved.
- Decision-making: In humans this involves weighing outcomes, rewards, and risks (frontal cortex often working with basal ganglia5). AI agents have decision modules (e.g. reinforcement learning policies, decision trees, or LLMs), but handling open-ended, goal-conflicting decisions with human-like adaptability is still L2 at best. Simple decisions from well-defined rewards (like games) are mastered by AI, but broad autonomous decision-making in the real world remains challenging.
- Working memory: Frontal networks (especially dorsolateral prefrontal cortex) can hold and manipulate information in mind (e.g. remembering a phone number or interim result). AI analogs include the context windows of neural networks or explicit memory buffers. While current models have limited memory (e.g. a Transformer’s context length or external memory in some architectures), this is an active area, and partial functionality exists. Still, the robust, general working memory humans exhibit (flexibly updating and focusing on relevant info) is not fully realized in AI (some aspects may be underexplored, edging into L3).
- Cognitive flexibility and inhibitory control: Frontal lobes allow us to shift strategies or perspectives and to suppress inappropriate impulses. AI systems are typically brittle in this regard – they follow their programming or learned policy rigidly, and struggle with on-the-fly strategy shifts or inhibiting a pre-potent response unless explicitly trained. This remains underexplored (L3). For instance, an AI might exploit a reward loophole (lack of “inhibitory” self-regulation) unless designers anticipate and constrain it. Future agent architectures may need a mechanism akin to frontal inhibitory control to moderate behaviors.
It’s worth noting that social-emotional functions involving the frontal lobe (like empathy, theory of mind, self-reflection) are still rudimentary in AI. Humans rely on frontal cortex (especially medial and orbitofrontal regions) interacting with limbic structures to navigate social situations and emotions. It is a network-level phenomena rather than confined to frontal lobe alone. AI agents do not yet possess genuine empathy or self-awareness – these functions remain L3 (largely absent). In summary, the frontal lobes contribute the supervisory, goal-directed intelligence that we often associate with “thinking”, and many of these capacities are only partially realized in AI to date.
Parietal lobes (perception integration and attention)
The parietal lobes are key to integrating sensory information from various modalities and constructing a spatial understanding of the world6. The anterior part of the parietal lobe contains the somatosensory cortex, which processes touch, proprioception (body position), and other somatic senses. The posterior parietal areas are crucial for spatial awareness, visuo-spatial processing, and attention – essentially, knowing where things are and how to interact with them. For example, the parietal lobe helps us localize objects in space, understand geometric relationships, and coordinate eye and hand movements by linking vision with motor plans. It also plays a central role in attention control, particularly the dorsal attention network that directs our focus to locations or sensory features of interest.
In AI terms, parietal lobe functions translate to an agent’s ability to perceive and navigate its environment:
- Multisensory integration: Robots and AI systems that use multiple sensors (vision, touch, etc.) attempt to combine those inputs into a coherent model of the environment. This is still partially developed (L2) – e.g., we have AI that can align vision with depth sensors or touch, but human-level integration (where a slight brush on the arm, a sound, and a peripheral visual cue all unify into a single event perception) is far from achieved.
- Spatial representation and mapping: Parietal circuits create internal maps (for instance, of your surroundings or your body in space). AI has made progress in spatial mapping and navigation (SLAM algorithms for robots build 3D maps, and deep reinforcement learning agents can navigate virtual mazes). This capability is moderately developed – certain tasks like autonomous driving or drone flight show that machines can handle spatial reasoning in constrained scenarios (L2). Yet, they lack the general-purpose, flexible spatial understanding humans have (e.g. understanding a cluttered room’s layout at a glance, or mentally rotating objects), so further research is needed.
- Attention mechanisms: The way the brain’s parietal–frontal circuits spotlight task-relevant information7 is loosely echoed by the attention heads in modern transformer networks and various attention mechanisms8. Humans, however, wield attention as an active lens: in a lecture we can simultaneously follow the speaker’s voice (auditory stream), skim the projected slide (visual stream), and monitor the clock in peripheral vision, then “zoom in” on a line of text to decode a formula, or “zoom out” to grasp the talk’s overall structure. Neurophysiology shows that such rapid shifts are driven by top-down signals from prefrontal cortex and thalamic relays that modulate sensory gain on the fly9. By contrast, transformer attention is fixed once its weights are learned; it does not receive real-time executive feedback about goals or context. Hence current AI is labelled as L2: computational attention is powerful but still lacks the adaptive, goal-directed control that characterises biological attention, making this an active frontier of research.
- Sensorimotor coordination: Parietal lobe helps translate between sensory coordinates and motor coordinates – for instance, computing how to reach for a seen object (integrating visual location with arm position). Some AI systems (robotic manipulators with vision) approximate this, using calibration and learned coordinate transforms. Still, human parietal cortex excels at online adjustments and using context (like adjusting reach if an object is moving). AI is catching up in domains like robotic arm manipulation, but general sensorimotor integration remains L2 (demonstrated in specific setups but not as universally robust as in humans).
Notably, AI has very limited touch sensing (making that L3), whereas the parietal somatosensory cortex finely discriminates texture, pressure, etc. Overall, parietal lobe functions are critical for an agent to perceive its world and orient within it, areas where AI has some successes (especially in vision) but still lacks the generality and fluidity of human perception.
Occipital lobes (visual processing)
The occipital lobes are the brain’s visual processing center10. The primary visual cortex (V1) in occipital lobe receives input from the eyes (via the thalamus) and extracts low-level features like edges. From there, occipital regions and adjacent visual areas (extending along the occipital-temporal border for the ventral stream, and occipital-parietal for the dorsal stream) hierarchically build up visual perception: detecting shapes, colors, motion, and eventually complex patterns and objects. In summary, the occipital lobe is primarily responsible for processing visual information.
In the context of AI, vision has been one of the most successful domains – thanks largely to deep learning:
- Visual perception (recognition): Machine vision systems (convolutional neural networks and their successors) can now match or exceed human performance in tasks like object recognition, face detection, and image classification11. This corresponds to the L1 level (well-developed) in AI. For example, AI vision models can instantly recognize thousands of object categories in images, a feat once thought to require human-like vision. This maps to what the occipital lobe and ventral visual cortex do – identifying what is in the visual field.
- Scene understanding and visual reasoning: Beyond raw perception, humans readily understand spatial relationships in a scene, contextual clues, and can perform reasoning on visual inputs (e.g. predicting what might happen next in a scene, or solving a visual puzzle). AI is partway (L2) here. Some systems can caption images or answer questions about a scene (vision-language models), indicating a degree of semantic understanding. Yet, these models often lack true grounded understanding – they might label objects but fail on deeper comprehension (for instance, understanding intentionality or causality from an image). Visual reasoning tasks (like answering abstract questions about a picture or performing complex video analysis) remain challenging.
- Visual attention and eye movements: Humans constantly move their eyes and focus on important parts of the visual field (a function involving occipital and parietal circuits). AI vision models don’t literally move eyes, but some incorporate attention mechanisms that mimic focusing on regions of an image. This is related to the earlier discussion on attention (shared with parietal function). It’s moderately well implemented in AI (L2), but this is not explicitly labelled under occipital as it can be considered part of the general attention function under parietal/frontal coordination.
AI has heavily developed the visual recognition capabilities, but capabilities like real-time 3D visual guidance are less mature (though present in robotics and self-driving cars). This distinction shows another gap in AI relative to the human visual system’s integrated prowess.
Temporal lobes (memory, language, and audition)
The temporal lobes have diverse but crucial roles in cognition, spanning auditory processing, language, memory, and high-level visual recognition12. The upper part of the temporal lobe (superior temporal gyrus) contains the primary auditory cortex, which processes sound inputs – frequencies, rhythms, etc. Adjacent areas (e.g. Wernicke’s area13 in the left temporal lobe) are essential for language comprehension, linking sounds to meaning. The medial temporal lobe houses the hippocampus and related structures, which are the heart of the brain’s episodic memory system (forming and retrieving autobiographical memories) and also support spatial navigation. The temporal lobe’s ventral visual stream (inferior temporal cortex) specializes in pattern recognition – including recognizing complex stimuli like faces (the fusiform face area) and scenes. In short, the temporal lobe is a multifaceted hub for recognition and memory.
Mapping these to AI capabilities:
- Language comprehension and production: Human language ability relies on temporal lobe (comprehension of words, meanings) in concert with frontal lobe (speech production via Broca’s area, and broader language planning). AI has seen remarkable advances here – LLMs can now parse text and generate fluent responses, indicating a high level of language competence in narrow settings. Machine translation, speech recognition, and speech synthesis are also quite advanced. Thus, for linguistic processing, AI is at L1 in many respects. An AI can “comprehend” and produce text in multiple languages with little human-like effort, though it should be noted that it’s often statistical rather than grounded understanding. Still, relative to other cognitive domains, language is a success story for AI, so the figure should mark functions like “Language Comprehension/Production” as well-developed (L1).
- Auditory perception: The ability to parse sound – speech, music, environmental noises – is another temporal lobe function. AI matches or exceeds humans in low-level auditory tasks like speech-to-text transcription under ideal conditions (think of virtual assistants accurately recognizing spoken commands). This is L1 for narrow cases (e.g. trained speech recognizers). However, true auditory scene analysis (understanding a cacophony of sounds, picking out one conversation in a noisy room – the “cocktail party effect”14) remains very hard for AI. So there are still aspects at L2/L3. But in Figure 1.1, the labelling is “Auditory Processing (L1)” given core progress in speech recognition.
- Episodic memory & learning: The hippocampus enables us to form new episodic memories (remembering experiences in context) and to perform lifelong learning by integrating new memories without wiping old ones. AI’s analogs here are continual learning algorithms and memoryaugmented networks. This area is underdeveloped (L3) – most AI systems do not learn continuously in a stable way; they suffer catastrophic forgetting if trained on new data unless special techniques are used. They also lack the rich, context-tagged memory of experiences that humans have. This function is therefore labeled as “Episodic Memory & Lifelong Learning (L3)”.
- Semantic memory and understanding: Beyond specific events, humans build up semantic memory – factual and conceptual knowledge about the world (much of this is linked to temporal lobe association areas as well). AI in some sense has simulated semantic memory: knowledge graphs, vast pretrained models that encode facts (e.g. GPT knows many facts from its training). So semantic understanding is partially there (L2). But AI’s knowledge can be superficial or lacking true comprehension of context. Thus, in Figure 1.1, “Semantic Knowledge/Understanding (L2)” is included as a function.
- Face and object recognition: The temporal lobe’s ventral stream areas identify objects and faces. AI vision is quite good at this (object and face recognition are at L1 with deep learning). This has also been captured under occipital functions already (visual perception).
Cerebellum (coordination, skill learning, and timing)
The cerebellum – the “little brain” at the back – is traditionally known for motor coordination and motor learning. It fine-tunes movements, maintaining balance and posture, and ensures movements are smooth and accurate15. When you learn a physical skill (like riding a bicycle or playing piano), the cerebellum is heavily involved in adapting motor commands through practice – essentially performing adaptive error correction based on feedback. Notably, the cerebellum contains more neurons than the rest of the brain combined, arranged in a highly regular circuitry ideal for learning patterns. In recent decades, research has revealed the cerebellum also contributes to certain cognitive and emotional functions, acting as a predictor or timing mechanism even in non-motor tasks. It’s been implicated in language (e.g. helping predict the timing of syllables) and even in aspects of attention and executive function. In essence, the cerebellum builds internal models that allow the brain to make fine-grained predictions and adjustments.
For AI, the cerebellum’s roles translate to capabilities that are still not fully realized in agents:
- Motor coordination and skill learning: In robotics, there are control algorithms and learning methods (like reinforcement learning with feedback) that echo what the cerebellum does. For instance, adaptive controllers can learn to correct a robot arm’s movements (analogous to cerebellar error correction). However, robots remain far clumsier than humans. They often lack the real-time adaptive finesse the cerebellum provides. This area is partially developed (L2) – we have examples of robotic learning for specific skills, but a general “cerebellum-like” module for adaptive motor control is not present in most AI agents. Thus, “Motor Skills & Coordination (L2)” by the cerebellum is listed to highlight this gap.
- Cognitive timing and prediction: The cerebellum is thought to function as an internal clock and predictor for events on the order of tens to hundreds of milliseconds. This is crucial for tasks like predicting sensory outcomes of one’s actions or timing when to initiate a sequence. AI systems typically do have predictive models (e.g. forward models in model-based reinforcement learning), but these are often task-specific. There isn’t a general mechanism like the cerebellum’s that seamlessly handles timing for perception and action across domains. Timing in AI agents (e.g. predicting when something will happen, not just what) is relatively underexplored (L3). One could imagine future AI agents with a dedicated module for temporal prediction and smooth sequencing – analogous to cerebellar function – but currently this is rudimentary.
- Error correction: This overlaps with coordination, but extends to cognitive domains. For example, cerebellar activity has been observed in language processing, possibly helping to predict and correct linguistic sequences16. AI does perform error correction in training (via backpropagation), but online error correction during tasks is limited. Real-time adaptive control (a feedback loop adjusting actions on the fly) is present in some advanced systems (e.g. adaptive cruise control in cars, or self-balancing robots), yet it’s not at human proficiency. “Adaptive error correction (L2)” is marked under cerebellum to reflect that AI can do this in narrow cases but lacks a general, brain-like capability to adapt behaviors fluidly whenever mismatches occur.
The cerebellum is often left out of high-level AI comparisons, but it shouldn’t be – it highlights the embodied, fine-control intelligence humans have. A particularly interesting insight from computational neuroscience is that different brain modules may correspond to different learning paradigms: cerebellum for supervised learning, basal ganglia for reinforcement learning, and cerebral cortex for unsupervised learning. This was proposed by Doya (1999)17 and others, noting how the cerebellum takes error signals (like a supervised loss), basal ganglia use reward feedback, and cortex finds patterns in data. This perspective can inspire AI: e.g., designing an agent with separate components for these learning types, mirroring brain organization. The cerebellum’s function (predictive control via error-driven learning) is included in Figure 1.1 to show that it’s a major gap in current AI agents that mostly rely on reinforcement learning and (in the case of deep networks) a form of error-driven learning during training only, not continuous adaptation like the cerebellum performs in real-time.
Brainstem (basic autonomic and arousal functions)
The brainstem (midbrain, pons, medulla) is the most evolutionarily ancient part of the brain, responsible for fundamental life-sustaining processes and reflexes. It acts as the main communication highway between the brain and body, and houses nuclei that control breathing, heart rate, blood pressure, swallowing, and reflexive actions like blinking18. In addition, the brainstem contains the reticular activating system, a network that regulates sleep-wake cycles and overall arousal level (i.e., how alert or vigilant you are). It also contributes to balance and posture (through vestibular nuclei) and coordinates head/eye movements via reflexes. Essentially, the brainstem keeps the body running and primes the brain’s level of consciousness.
In terms of AI or robotics:
- Survival autopilot: Many of the brainstem’s duties have no direct analogue in non-embodied AI (a chatbot doesn’t need to regulate blood pressure!). However, in robotics, low-level control loops (for locomotion, balance, etc.) play a similar role. For instance, a bipedal robot uses feedback controllers that mimic reflexes to keep upright. These can be considered L1 (well-developed) in very narrow scopes – engineers can design reflexive responses (like a withdrawal reflex if a robot arm meets resistance). “Reflexive Responses (L1)” is also labelled at the brainstem, as simple reflex-like behaviors in robots (or even in software agents, e.g. an immediate reaction to an input) are straightforward and implemented.
- Autonomic regulation: Since AI agents don’t have a body with physiology, they lack an equivalent of the autonomic nervous system. One could argue that some AI systems regulate internal variables (CPU temperature throttling, memory management) automatically, but this is a stretch as a cognitive function. Thus, “Autonomic Regulation (L3)” is labelled to indicate it is under-explored in AI. If we consider future embodied AI (like intelligent androids), they might need something like this to manage power, self-maintenance, etc., but it’s speculative.
- Arousal and global attention state: The brainstem’s influence on arousal has interesting parallels to AI in the sense of adaptive computation. Humans can be drowsy or hyper-alert, which affects how we process information. AI systems currently lack any explicit notion of “being alert” vs “tired” – they run in a fixed mode unless programmed otherwise. There is research into adaptive AI that could, say, slow down to save energy or limit computation when not needed, but it’s not mainstream. “Arousal/Attention States (L3)” is marked as largely unaddressed. However, one might draw a parallel with how some AI models can attend more or less strongly to inputs (controlled by parameters), somewhat akin to gain control in neurons under different arousal. This is a loose analogy; overall, the global modulatory role the brainstem (with neuromodulators like norepinephrine, serotonin) plays – affecting mood and readiness – is missing in AI agents.
It should be clarified that the functions discussed here are primarily relevant for embodied AI or robotics. For instance, a self-driving car’s automatic braking when a collision is imminent is reflex-like (and indeed implemented in today’s tech). But an AI algorithm in isolation doesn’t have “body regulation”. So, brainstem functions underscore how biological intelligence is deeply tied to a body, whereas AI often abstracts that away. It’s an important conceptual gap for readers to appreciate: truly human-like AI might need analogue systems for maintaining its “well-being” (self-preservation, energy management) and adjusting its alertness to situations – concepts drawn from the brainstem and related systems.
Subcortical systems (thalamus, basal ganglia, limbic system)
Finally, beyond the six major regions above, subcortical systems deserve representation, as they are crucial to cognition and differ markedly from what current AI implementations include. These structures are embedded below the cortex and often coordinate with multiple cortical regions:
- Thalamus (sensory relay and attention filter): The thalamus is often called the brain’s “gateway” or relay station – almost all sensory signals (except smell) pass through thalamic nuclei before reaching the cortex19. But the thalamus does more than relay: it actively modulates and integrates signals. It plays a key role in attention20 by amplifying relevant signals and suppressing others, under guidance from cortical feedback. It’s also involved in maintaining consciousness (targeted by anesthetics) and coordinating cortical rhythms. In AI, there is no single equivalent of a thalamus. However, one might liken it to routing layers or attention mechanisms that decide which data go where. The concept of a central hub that gates information flow in a network is present in some neural network architectures (e.g. transformer attention decides which inputs “attend” to which others), but the thalamus’s dynamic, task-dependent control is beyond current AI. Functions like “Sensory Integration & Routing” are marked as L2 (partially present conceptually in AI via attention layers), and “Global Workspace” as L3 (largely absent – AI doesn’t have a unified workspace model equivalent to what some cognitive theories assign to thalamocortical circuits).
- Basal ganglia (action selection and reinforcement learning): The basal ganglia are a group of nuclei (caudate, putamen, globus pallidus, substantia nigra, etc.) that are central to selecting and initiating actions, and they implement a biological form of reinforcement learning. They take inputs from the cortex (especially frontal and parietal areas), and through complex loops, they determine which actions are facilitated or inhibited21, often by evaluating expected rewards or outcomes. Dopamine signals from the midbrain (e.g. from the substantia nigra pars compacta) encode reward prediction errors, a concept very much like the reward signals in AI RL algorithms. In fact, neuroscientific evidence suggests the basal ganglia “learn” which actions lead to reward via dopamine-mediated plasticity – a direct parallel to how AI agents update policies from reward feedback. It can be confidently said that “Reward-Based Learning and Habit Formation” are primary functions of the basal ganglia. AI has a whole subfield of reinforcement learning, which has seen successes (games, some robotic tasks), so this is moderately developed (L2) in AI. However, current AI RL is narrow and data-hungry compared to human habit learning. Basal ganglia also contribute to procedural memory (learning habits or skills that become automatic) – something AI doesn’t explicitly differentiate (it learns policies, but the idea of habits vs. goal-directed actions is an emerging concept in AI research).
- Limbic system – amygdala and hippocampus (emotion and memory): These were touched on in the temporal lobe section, but to reiterate: the amygdala is crucial for processing emotional significance of stimuli and fear conditioning22 (learning to avoid harmful situations). It assigns value (good or bad) to experiences, which then influences decision-making and memory (through its connections to hippocampus and frontal cortex). The hippocampus, as mentioned, enables forming new declarative memories and mapping environments (it’s often likened to the brain’s GPS for spatial memory). In AI, there are nascent attempts to model hippocampal function – e.g. neural network “memory” modules for episodic recall, or models of spatial navigation that emulate place cells and grid cells found in the hippocampal formation. These are still L3 overall (exploratory). As for the amygdala’s role, AI currently lacks genuine emotion; at most, we simulate “emotion” as reward functions or use sentiment analysis to detect emotions in text, which is not an internal drive. The labels “Emotion Processing & Learning (L3)” and “Episodic Memory (L3)” highlight capabilities largely missing in AI agents: affective computing (AI that can experience or at least robustly respond to emotions) is very limited, and one-shot contextual learning (storing a new event and generalizing from it) is also an open problem.
- Hypothalamus (drives and homeostasis): The hypothalamus orchestrates the endocrine system and autonomic nervous system to maintain internal balance – it controls things like hunger, thirst, temperature, and release of hormones23. Treasure Island (FL): StatPearls Publishing; 2025 Jan–, 2023.]. It also generates primitive drives (e.g. hunger drives you to seek food, which the cortex then plans for). AI agents do not have intrinsic survival needs, so they lack any true equivalent of homeostatic drives. We sometimes give AI an objective function (e.g. maximize score), but these are externally defined and do not fluctuate like biological needs. To the extent researchers are exploring intrinsic motivation for AI (like curiosity-based rewards), it is still rudimentary. In the figure, “Motivation & Drives (L3)” is added to acknowledge this gap. It reminds us that a human-inspired AI agent might require internally generated goals (not just tasks imposed by users) to be truly autonomous and robust in varied environments. It should be noted that giving AI intrinsic motivation is also seen as a potentially dangerous direction24 and should be treated with great caution.
Bringing these subcortical pieces together, we see that many are poorly represented in today’s AI. Current intelligent systems are heavily cortex-like (perception modules, decision logic, etc.) but lack the rich support system of the subcortical brain: no analog of a thalamus to smartly route information, no hypothalamus to create self-preserving goals, a primitive version of basal ganglia for RL at best, and minimal emotional or episodic memory faculties. The figure’s accompanying explanation should drive home that cognition arises from cortical–subcortical interactions. For example, decision-making is not just frontal (cortical) deliberation, but also involves basal ganglia (habits and dopamine rewards) and amygdala (emotional bias) and hypothalamus (drive states). Emphasizing these interactions will lend a more nuanced and truthful picture than a simplistic lobe-by-lobe map. It also inspires AI researchers to think about architectures that incorporate these principles – such as an agent that, say, has a core RL module (analogous to basal ganglia) for learning from rewards, a memory module (analogous to hippocampus) for episodic recall, and perhaps a “global workspace” (inspired by thalamocortical loops) for attention and context integration.
Bridging brain-like functions and building beneficial AI
Until now, we have witnessed the gap between human brain and machine intelligence. Nevertheless, the objective is not necessarily to replicate every facet of human cognition within artificial intelligence systems. Rather, our overarching aim should be to develop intelligent agents that are useful, ethical, safe, and beneficial to society. By critically comparing human and artificial intelligence, existing gaps are highlighted, and promising directions for innovation illuminated. This comparative perspective allows us to selectively integrate beneficial aspects of human cognition, such as energy-efficient processing, lifelong adaptive learning, emotional grounding, and rich creativity, while simultaneously innovating beyond human limitations. Ultimately, this approach aims to foster the creation of more capable, resilient, and responsible AI systems.
Furthermore, it is vital to consider the evolving role of humans within a hybrid Human-AI society. The goal of AI should not be to replace human roles entirely, but rather to augment and empower human abilities, complementing human skills and judgment in areas where AI excels, such as handling vast datasets, performing rapid calculations, and automating repetitive tasks. Human oversight and interpretability are essential to ensure that powerful AI systems remain controllable and aligned with human values and ethical standards. Thus, the core objective must be the development of AI technologies that are transparent, interpretable, and responsive to human guidance.
Human-centered AI design emphasizes collaboration, safety, and social responsibility, ensuring technological advancement proceeds in a controlled, reliable manner. By placing humans at the center of the AI ecosystem, we can harness AI’s potential to enhance human productivity, creativity, and decision-making, facilitating technical and societal progress without compromising human autonomy or dignity. Ultimately, a thoughtful integration of human intelligence and AI capabilities can pave the way for a sustainable, equitable, and prosperous future.
Next part: Section 1.3 – Foundation agents: a modular and brain-inspired AI agent framework.
Article source: Liu, B., Li, X., Zhang, J., Wang, J., He, T., Hong, S., … & Wu, C. (2025). Advances and challenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems. arXiv preprint arXiv:2504.01990. CC BY-NC-SA 4.0.
Header image: AI is Everywhere by Ariyana Ahmad & The Bigger Picture / Better Images of AI, CC BY 4.0.
References:
- Eric R Kandel, James H Schwartz, Thomas Jessell, Steven A Siegelbaum, and AJ Hudspeth. Principles of Neural Science, 2013. ↩
- Dale Purves, George J Augustine, David Fitzpatrick, William Hall, Anthony-Samuel LaMantia, and Leonard White. Neuroscience. De Boeck Supérieur, 2019. ↩
- Eric R Kandel, James H Schwartz, Thomas Jessell, Steven A Siegelbaum, and AJ Hudspeth. Principles of Neural Science, 2013. ↩
- Wikipedia, CC BY-SA 4.0. ↩
- Noga Larry, Gil Zur, and Mati Joshua. Organization of reward and movement signals in the basal ganglia and cerebellum. Nature Communications, 15(1):2119, 2024. ↩
- Dale Purves, George J Augustine, David Fitzpatrick, William Hall, Anthony-Samuel LaMantia, and Leonard White. Neuroscience. De Boeck Supérieur, 2019. ↩
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), volume 30, pages 5998–6008. Curran Associates, Inc., 2017. ↩
- Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R Martin, Ming-Ming Cheng, and Shi-Min Hu. Attention mechanisms in computer vision: A survey. Computational Visual Media, 8(3):331–368, 2022. ↩
- Maurizio Corbetta and Gordon L Shulman. Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews Neuroscience, 3(3):201–215, 2002. ↩
- Eric R Kandel, James H Schwartz, Thomas Jessell, Steven A Siegelbaum, and AJ Hudspeth. Principles of Neural Science, 2013. ↩
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015. ↩
- Dale Purves, George J Augustine, David Fitzpatrick, William Hall, Anthony-Samuel LaMantia, and Leonard White. Neuroscience. De Boeck Supérieur, 2019. ↩
- Wikipedia, CC BY-SA 4.0. ↩
- Wikipedia, CC BY-SA 4.0. ↩
- James Knierim. Chapter 5: Cerebellum, 2020. ↩
- Torgeir Moberget and Richard B Ivry. Cerebellar contributions to motor control and language comprehension: searching for common computational principles. Annals of the New York Academy of Sciences, 1369(1):154–171, 2016. ↩
- Kenji Doya. What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Networks, 12(7-8):961–974, 1999. ↩
- Cleveland Clinic. Brainstem: What It Is, Function, Anatomy & Location, 2024. ↩
- Cleveland Clinic. Thalamus: What It Is, Function & Disorders, 2022. ↩
- Ralf D Wimmer, L Ian Schmitt, Thomas J Davidson, Miho Nakajima, Karl Deisseroth, and Michael M Halassa. Thalamic control of sensory selection in divided attention. Nature, 526(7575):705–709, 2015. ↩
- Noga Larry, Gil Zur, and Mati Joshua. Organization of reward and movement signals in the basal ganglia and cerebellum. Nature Communications, 15(1):2119, 2024. ↩
- Wikipedia, CC BY-SA 4.0. ↩
- Jose G. Sanchez Jimenez and Orlando De Jesus. Hypothalamic Dysfunction. StatPearls [Internet ↩
- Yoshua Bengio, Michael Cohen, Damiano Fornasiere, Joumana Ghosn, Pietro Greiner, Matt MacDermott, Sören Mindermann, Adam Oberman, Jesse Richardson, Oliver Richardson, et al. Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path? arXiv preprint arXiv:2502.15657, 2025. ↩




